[tickets] [opensaf:tickets] #248 "amf: Incorrect return code from saAmfComponentErrorReport_4 () and saAmfComponentErrorClear_4()".
- **status**: invalid --> unassigned - **Comment**: This issue is observed, if ErrorReport or ErrorCLear api is called from a stand alone AMF executable, rather than a part of AMF component. So, to reproduce the issue : 1 )Call the saAmfInitialize. 2) Call the saAmfFinalize with the handle obtained in first step. 3) Call the ErrorReport api with the finalize handle. Proper return value is obtained if the ErrorReport api is called as part of component which is spawned by AMF and saAmfRegister is mandatory to be invoked as part of initialization. --- ** [tickets:#248] "amf: Incorrect return code from saAmfComponentErrorReport_4 () and saAmfComponentErrorClear_4()". ** **Status:** unassigned **Milestone:** 5.2.FC **Created:** Thu May 16, 2013 06:41 AM UTC by Praveen **Last Updated:** Mon Nov 07, 2016 05:41 AM UTC **Owner:** Nagendra Kumar Migrated from http://devel.opensaf.org/ticket/2817. Changeset:3728 When saAmfComponentErrorReport_4() and saAmfComponentErrorReport_4() are called after finalizing the amfHandle(calling saAmfFinalize()), both of them returns SA_AIS_ERR_VERSION instead of SA_AIS_ERR_BAD_HANDLE. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #211 saEvtChannelUnlink API returns SA_AIS_ERR_LIBRARY in a corner case.
Apart from ERR_LIBRARY, saEvtChannelUnlink api returns SA_AIS_ERR_NOT_EXIST. --- ** [tickets:#211] saEvtChannelUnlink API returns SA_AIS_ERR_LIBRARY in a corner case.** **Status:** unassigned **Milestone:** future **Created:** Wed May 15, 2013 07:09 AM UTC by Mathi Naickan **Last Updated:** Wed Jul 15, 2015 02:50 PM UTC **Owner:** nobody **Attachments:** - [osafevtd](https://sourceforge.net/p/opensaf/tickets/211/attachment/osafevtd) (178.2 kB; application/octet-stream) Setup: SLES 11 64bit VM setup. Test Scenario: 1. Invoke saEvtInitialize. 2. Open Channel as CREATE and PUBLISHER 3. Allocate, AttributeSet? and Free the Event 4. Close and Unlink the Channel 5. Finalize Evt session. Observed from /var/log/messages that runtime object delete fails for evt channel. Oct 4 18:33:31 linux-b4xy osafevtd[4588]: saImmOiRtObjectDelete failed. Channel: safChnl=channel_37. rc = 12 = >From osafevtd logs: == Oct 4 18:33:31.846959 osafevtd [4588:eds_evt.c:1079] >> eds_proc_eda_api_msg Oct 4 18:33:31.846974 osafevtd [4588:eds_evt.c:0444] >> eds_proc_chan_unlink_msg: agent dest: 20100310f003d Oct 4 18:33:31.846989 osafevtd [4588:eds_ll.c:1669] >> eds_channel_unlink: channel name: safChnl=channel_37 Oct 4 18:33:31.847004 osafevtd [4588:eds_ll.c:1674] TR Use count: 0 Oct 4 18:33:31.847019 osafevtd [4588:eds_ll.c:0505] >> is_active_channel: chan_name: safChnl=channel_37 Oct 4 18:33:31.847034 osafevtd [4588:eds_ll.c:0513] << is_active_channel: true: channel is not marked as unlinked Oct 4 18:33:31.847050 osafevtd [4588:eds_ll.c:1678] TR Setting the unlink flag for this channel Oct 4 18:33:31.847064 osafevtd [4588:eds_ll.c:0389] >> eds_remove_cname_rec: chan_name: safChnl=channel_37 Oct 4 18:33:31.847080 osafevtd [4588:eds_ll.c:0410] << eds_remove_cname_rec Oct 4 18:33:31.847095 osafevtd [4588:eds_ll.c:1684] TR Use count is zero, delete the and IMM object Oct 4 18:33:31.848039 osafevtd [4588:eds_ll.c:1689] ER saImmOiRtObjectDelete failed. Channel: safChnl=channel_37. rc = 12 Oct 4 18:33:31.848058 osafevtd [4588:eds_ll.c:1690] << eds_channel_unlink Oct 4 18:33:31.848073 osafevtd [4588:eds_evt.c:0449] TR Channel unlink failed for :20100310f003d == Changeset: 2852 Note: When this scenario is run in batch mode, this issue is observed. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2377 AMF: SG in unstable state after couple of admin operations during headless scenario
- **status**: assigned --> duplicate - **Comment**: The application doesn't have any si-si deps configured. The issue is the same observed in #2105 , where the application responded during link loss time. Closing this ticket as duplicate of #2105 --- ** [tickets:#2377] AMF: SG in unstable state after couple of admin operations during headless scenario** **Status:** duplicate **Milestone:** 5.2.RC2 **Created:** Wed Mar 15, 2017 04:54 AM UTC by Srikanth R **Last Updated:** Mon Mar 27, 2017 09:12 PM UTC **Owner:** Nagendra Kumar **Attachments:** - [logs.tgz](https://sourceforge.net/p/opensaf/tickets/2377/attachment/logs.tgz) (7.6 MB; application/x-compressed) Changeset : 8634 5.2.FC Setup : 2 controllers with 3 payloads ( Headless feature enabled) AMF application : 2n application 2 SUs 4SIs ( si-si deps disabled) Steps performed : -> Initially brought up 5 nodes. -> Deployed the attached configuration. -> Performed admin operations on SG couped with 2 headless operations. -> Later performed shutdown operation of SG, which resulted in unstable state. Attached logs : -> syslog,amfd and amfnd traces of both controllers and PL-3. -> AMF application --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2372 amf/clm: CLM lock of two more nodes returns REPAIR_PENDING for first node.
>From the starting of CLM implementation, the service doesn't support admin >operations on more than one node simultaneously. There was a discussion ( or >ticket) on the earlier trac ticket system that CLM doesn't support operation >on two entities simultaneously. Below is the simple scenario to reproduce. -> Bring up CLM agent, and subscribe to the track callback. Do not respond to the START callback. -> Now perform CLM lock operation on the two payloads in two different terminals. -> In the CLM application, Respond to the callbacks only after invoking both admin operations. -> Both admin operations shall result in SA_AIS_ERR_REPAIR_PENDING return code. It seems that CLM doesn't store the invocation id for the initial admin op from the below output in syslog. Mar 15 11:54:20 SLES-1 osafamfd[3276]: NO Pending Response sent for CLM track callback::OK '7' --- ** [tickets:#2372] amf/clm: CLM lock of two more nodes returns REPAIR_PENDING for first node.** **Status:** accepted **Milestone:** 5.0.2 **Created:** Tue Mar 14, 2017 09:29 AM UTC by Praveen **Last Updated:** Tue Mar 14, 2017 09:29 AM UTC **Owner:** Praveen **Attachments:** - [osafamfd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafamfd) (3.4 MB; application/octet-stream) - [osafclmd](https://sourceforge.net/p/opensaf/tickets/2372/attachment/osafclmd) (860.9 kB; application/octet-stream) Steps to reproduce: 1) Bring 4 nodes cluster up. 2) Deploy AMf demo on PL-3 and PL-4. 3) LOCK amfd nodes PL-3 and PL-4. 4) Make arranegements so that termination of amf_demo on PL-3 takes more time compare to PL-4. 5)From one terminal issue CLM lock of PL-3 first and in not time issue CLM lock of PL-4. CLM and AMF traces are attached. Analysis: When AMFD gets CLM track callback for PL-3 it starts terminating amf demo on PL-3. When termination of amf_demo still going on AMF gets another track callback with rootcausetentity as PL-4. However callback contains information of PL-3 also. AMFD starts terminating amf_demo on PL-4 but at the same time it responds of PL-3 with invocation id of PL-4 callback. CLM assumes that PL-4 change_started completed and sends completion callback for PL-4. In this callback, AMF clears internal flags which monitors the graceful removal of nodes. Since AMF never responded for PL-3 callback, callback timer expires in CLMD and it sends complete callback to AMF. AMF thinks this is the case of nodefailover and tries to failover PL-3. Note: In all these stages, CLM sends track callback with information of all the nodes. AMF registers params are: SA_TRACK_CURRENT|SA_TRACK_CHANGES_ONLY|SA_TRACK_VALIDATE_STEP|SA_TRACK_START_STEP. I am still evaluating whther issue is in CLM or AMF. Since AMF registers for **|SA_TRACK_CHANGES_ONLY|** should CLM give information of all the nodes in all subsequent callbacks? Also AMF should respond to callback when it has completed termination of comps. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2377 AMF: SG in unstable state after couple of admin operations during headless scenario
--- ** [tickets:#2377] AMF: SG in unstable state after couple of admin operations during headless scenario** **Status:** unassigned **Milestone:** 5.2.RC2 **Created:** Wed Mar 15, 2017 04:54 AM UTC by Srikanth R **Last Updated:** Wed Mar 15, 2017 04:54 AM UTC **Owner:** nobody **Attachments:** - [logs.tgz](https://sourceforge.net/p/opensaf/tickets/2377/attachment/logs.tgz) (7.6 MB; application/x-compressed) Changeset : 8634 5.2.FC Setup : 2 controllers with 3 payloads ( Headless feature enabled) AMF application : 2n application 2 SUs 4SIs ( si-si deps disabled) Steps performed : -> Initially brought up 5 nodes. -> Deployed the attached configuration. -> Performed admin operations on SG couped with 2 headless operations. -> Later performed shutdown operation of SG, which resulted in unstable state. Attached logs : -> syslog,amfd and amfnd traces of both controllers and PL-3. -> AMF application --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2351 Opensaf failed to start on CLM locked node
- **summary**: IMM: 2PBE: Opensaf failed to start on standby controller --> Opensaf failed to start on CLM locked node - Description has changed: Diff: --- old +++ new @@ -5,7 +5,7 @@ 2PBE enable with no load Summary: -OpenSAF failed to start on standby controller when 2PBE is enabled +OpenSAF failed to start on standby controller when 2PBE is enabled and standby is in CLM locked state Step performed: 1. Enabled 2PBE in immd.conf for both controllers - **status**: invalid --> unassigned - **Component**: imm --> base - **Version**: --> 5.2.FC - **Comment**: Re-opening the ticket. It should be either documented that opensafd shall fail to start on CLM locked node or otherwise opensafd should be started on CLM locked node with all services returning SA_AIS_ERR_UNAVAILABLE for the applications. --- ** [tickets:#2351] Opensaf failed to start on CLM locked node** **Status:** unassigned **Milestone:** 5.2.RC1 **Created:** Tue Mar 07, 2017 08:41 AM UTC by Chani Srivastava **Last Updated:** Tue Mar 07, 2017 10:52 AM UTC **Owner:** nobody **Attachments:** - [Logs2PBE.zip](https://sourceforge.net/p/opensaf/tickets/2351/attachment/Logs2PBE.zip) (1.2 MB; application/zip) Environment details OS : Suse 64bit Changeset : 8603( 5.2.MO-1) 2PBE enable with no load Summary: OpenSAF failed to start on standby controller when 2PBE is enabled and standby is in CLM locked state Step performed: 1. Enabled 2PBE in immd.conf for both controllers 2. Started opensaf on all nodes sequntially --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2284 IMM: Improper return code without any error string while deleting large number of objects
To my understanding, this ticket is raised to correct the invalid return code ( ERR_LIBRARY). As per the ticket description, the expected behavior is " Expected behavior - Proper return code with error string should be returned " What is the necessity of a new ticket ? --- ** [tickets:#2284] IMM: Improper return code without any error string while deleting large number of objects** **Status:** invalid **Milestone:** 5.2.RC1 **Created:** Wed Feb 01, 2017 07:13 AM UTC by Chani Srivastava **Last Updated:** Thu Mar 09, 2017 01:15 PM UTC **Owner:** nobody Steps to reproduce: 1. Bring up opensaf on a cluster 2. Create around 10k objects 3. Try deleating these objects in one immcfg operation Output: Error Returned - error - saImmOmAdminOwnerSet FAILED: SA_AIS_ERR_LIBRARY (2) No error string stating the cause of failure is returned. Syslog - immcfg: ER TOO MANY Object Names line:733 Expected behavior - Proper return code with error string should be returned --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Announcing the Oxford Dictionaries API! The API offers world-renowned dictionary content that is easy and intuitive to access. Sign up for an account today to start using our lexical data to power your apps and projects. Get started today and enter our developer competition. http://sdm.link/oxford___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2339 CLM : Cluster reset doesn't succed as "reboot now" command fails on SLES
Either "shutdown -r now" or simple "reboot" command should be suffice for graceful reboot --- ** [tickets:#2339] CLM : Cluster reset doesn't succed as "reboot now" command fails on SLES** **Status:** unassigned **Milestone:** 5.2.RC1 **Created:** Fri Mar 03, 2017 06:05 AM UTC by Srikanth R **Last Updated:** Fri Mar 03, 2017 06:05 AM UTC **Owner:** nobody Changeset : 8634 5.2.FC SLES TIPC setup with one controller. Once the controller is brought up with opensaf 5.2.FC, the following cluster reset command is issued. immadm -o 4 safCluster=myClmCluster The command failed with the following log. Mar 11 16:09:52 SUSE-S1-C1 osafclmd[6772]: Command: /usr/lib64/opensaf/opensaf_reboot 0 not_used 1 failed, rc = 256 On SLES, "reboot now" command fails. Instead "shutdown -r now" should be invoked for graceful shutdown. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2339 CLM : Cluster reset doesn't succed as "reboot now" command fails on SLES
--- ** [tickets:#2339] CLM : Cluster reset doesn't succed as "reboot now" command fails on SLES** **Status:** unassigned **Milestone:** 5.2.RC1 **Created:** Fri Mar 03, 2017 06:05 AM UTC by Srikanth R **Last Updated:** Fri Mar 03, 2017 06:05 AM UTC **Owner:** nobody Changeset : 8634 5.2.FC SLES TIPC setup with one controller. Once the controller is brought up with opensaf 5.2.FC, the following cluster reset command is issued. immadm -o 4 safCluster=myClmCluster The command failed with the following log. Mar 11 16:09:52 SUSE-S1-C1 osafclmd[6772]: Command: /usr/lib64/opensaf/opensaf_reboot 0 not_used 1 failed, rc = 256 On SLES, "reboot now" command fails. Instead "shutdown -r now" should be invoked for graceful shutdown. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2327 Opensaf failed to start on active controller ( random)
- **status**: unassigned --> invalid - **Comment**: Closing the ticket as invalid, as older library file is remnant during re-installation. After proper installation, there is no issue with opensafd startup. --- ** [tickets:#2327] Opensaf failed to start on active controller ( random)** **Status:** invalid **Milestone:** 5.2.RC1 **Created:** Wed Mar 01, 2017 06:22 AM UTC by Srikanth R **Last Updated:** Wed Mar 01, 2017 06:22 AM UTC **Owner:** nobody **Attachments:** - [opensafStartup.tgz](https://sourceforge.net/p/opensaf/tickets/2327/attachment/opensafStartup.tgz) (1.4 MB; application/x-compressed-tar) Changeset: 8634 5.2.FC SLES single node TIPC setup. Issue : opensafd failed to startup on active controller for the first time. Below is the output from syslog Mar 6 01:27:19 SUSE-S1-C1 opensafd[11180]: NO Monitoring of CLMD started Mar 6 01:27:19 SUSE-S1-C1 osafclmna[11211]: NO safNode=SC-1,safCluster=myClmCluster Joined cluster, nodeid=2010f Mar 6 01:27:19 SUSE-S1-C1 osafamfd[11301]: Started Mar 6 01:27:29 SUSE-S1-C1 osafamfd[11301]: WA saClmInitialize_4 returned 5 Mar 6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER saImmOiInitialize failed 5 Mar 6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER avd_imm_init FAILED Mar 6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER initialize_for_assignment FAILED 2 Mar 6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER initialize failed, exiting Mar 6 01:27:29 SUSE-S1-C1 opensafd[11180]: ER Failed DESC:AMFD Mar 6 01:27:29 SUSE-S1-C1 opensafd[11180]: ER Going for recovery Below is the output from clmd. Mar 6 1:27:29.273608 osafclmd [11291:src/clm/clmd/clms_mds.c:1194] << clms_mds_svc_event Mar 6 1:27:29.273644 osafclmd [11291:src/mbc/mbcsv_mds.c:0420] << mbcsv_mds_evt: Msg is not from same vdest, discarding Mar 6 1:27:29.269263 osafclmd [11291:src/imm/agent/imma_oi_api.cc:2783] << rt_object_update_common Mar 6 1:27:29.273697 osafclmd [11291:src/clm/clmd/clms_imm.c:0842] IN saImmOiRtObjectUpdate failed for cluster object with rc = 5. Trying again Mar 6 1:27:29.273709 osafclmd [11291:src/clm/clmd/clms_imm.c:0871] << clms_cluster_update_rattr Traces of clmd,amfd,amfnd,immd and immnd along with mds.log and syslog are attached. This issue is random. Observed two times out of three times when started on lone active controller. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2327 Opensaf failed to start on active controller ( random)
--- ** [tickets:#2327] Opensaf failed to start on active controller ( random)** **Status:** unassigned **Milestone:** 5.2.RC1 **Created:** Wed Mar 01, 2017 06:22 AM UTC by Srikanth R **Last Updated:** Wed Mar 01, 2017 06:22 AM UTC **Owner:** nobody **Attachments:** - [opensafStartup.tgz](https://sourceforge.net/p/opensaf/tickets/2327/attachment/opensafStartup.tgz) (1.4 MB; application/x-compressed-tar) Changeset: 8634 5.2.FC SLES single node TIPC setup. Issue : opensafd failed to startup on active controller for the first time. Below is the output from syslog Mar 6 01:27:19 SUSE-S1-C1 opensafd[11180]: NO Monitoring of CLMD started Mar 6 01:27:19 SUSE-S1-C1 osafclmna[11211]: NO safNode=SC-1,safCluster=myClmCluster Joined cluster, nodeid=2010f Mar 6 01:27:19 SUSE-S1-C1 osafamfd[11301]: Started Mar 6 01:27:29 SUSE-S1-C1 osafamfd[11301]: WA saClmInitialize_4 returned 5 Mar 6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER saImmOiInitialize failed 5 Mar 6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER avd_imm_init FAILED Mar 6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER initialize_for_assignment FAILED 2 Mar 6 01:27:29 SUSE-S1-C1 osafamfd[11301]: ER initialize failed, exiting Mar 6 01:27:29 SUSE-S1-C1 opensafd[11180]: ER Failed DESC:AMFD Mar 6 01:27:29 SUSE-S1-C1 opensafd[11180]: ER Going for recovery Below is the output from clmd. Mar 6 1:27:29.273608 osafclmd [11291:src/clm/clmd/clms_mds.c:1194] << clms_mds_svc_event Mar 6 1:27:29.273644 osafclmd [11291:src/mbc/mbcsv_mds.c:0420] << mbcsv_mds_evt: Msg is not from same vdest, discarding Mar 6 1:27:29.269263 osafclmd [11291:src/imm/agent/imma_oi_api.cc:2783] << rt_object_update_common Mar 6 1:27:29.273697 osafclmd [11291:src/clm/clmd/clms_imm.c:0842] IN saImmOiRtObjectUpdate failed for cluster object with rc = 5. Trying again Mar 6 1:27:29.273709 osafclmd [11291:src/clm/clmd/clms_imm.c:0871] << clms_cluster_update_rattr Traces of clmd,amfd,amfnd,immd and immnd along with mds.log and syslog are attached. This issue is random. Observed two times out of three times when started on lone active controller. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #252 amf: saAmfPmStart_3() returns SA_AIS_ERR_ACCESS when called with invalid recovery.
Yes. The ticket can be closed as invalid, as the api can return ERR_ACCESS. --- ** [tickets:#252] amf: saAmfPmStart_3() returns SA_AIS_ERR_ACCESS when called with invalid recovery.** **Status:** review **Milestone:** 5.2.FC **Created:** Thu May 16, 2013 06:49 AM UTC by Praveen **Last Updated:** Fri Nov 04, 2016 09:52 AM UTC **Owner:** Nagendra Kumar Migrated from http://devel.opensaf.org/ticket/2813. Changeset:3728 When saAmfPmStart_3() is called with invalid value of SaAmfRecommendedRecoveryT (say 19), it returns SA_AIS_ERR_ACCESS instead of SA_AIS_ERR_INVALID_PARAM. SA_AIS_ERR_ACCESS should be returned when AMF rejects recommended recovery from functionality perspective and should not returned as validation check. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] Re: #2094 Standby controller goes for reboot on stopping openSaf with STONITH enabled cluster
For the scenario-2, -> Management software e.g. SWN other than opensafd issued reboot on standby controller. From opensaf perspecitve , the standby controller might be healthy member of a cluster. But from the SWN perspecitve, node needs to be repaired and reboot is invoked. -> When reboot command is invoked by SWN, all services in configured runlevel shall be stopped in the order. -> Once the opensafd stop script is invoked on standby controller, active controller detects that the standby controller is in healthy state and remote fencing shall be done. -> As part of remote fencing, the node shall be hard rebooted, which doesn't give chance for other services in runlevel to be stopped gracefully. -> If the SWN has a database service ( e.g. drbd) which is to be stopped after opensafd stop, the database service stop script shall not be invoked as remote fencing is done. This may result in bad state for the other management software e.g. SWN. Suggestion : 1) Either opensaf shall document that admin needs to perform clm admin lock of standby controller before repairing. OR 2) FM should detect the difference between opensafd stop and hung opensaf processes. As part of opensafd stop, peer fmd on standby contoller can update fmd on active controller that opensafd on standby is going gracefully. --- ** [tickets:#2094] Standby controller goes for reboot on stopping openSaf with STONITH enabled cluster** **Status:** unassigned **Milestone:** 5.2.FC **Created:** Wed Oct 05, 2016 07:28 AM UTC by Chani Srivastava **Last Updated:** Tue Nov 08, 2016 11:49 AM UTC **Owner:** nobody OS : Ubuntu 64bit Changeset : 7997 ( 5.1.FC) Setup : 2-node cluster (both controllers) Remote fencing enabled Steps: 1. Bring up OpenSaf on two nodes 2. Enable STONITH 3. Stop opensaf on Standby Active controller triggers reboot of standby SC-1 Syslog Oct 5 13:01:23 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:4, dest:565215202263055) Oct 5 13:01:23 SC-1 osafimmnd[5545]: NO Global discard node received for nodeId:2020f pid:3579 Oct 5 13:01:23 SC-1 osafimmnd[5545]: NO Implementer disconnected 14 <0, 2020f(down)> (@safAmfService2020f) Oct 5 13:01:24 SC-1 osafamfd[5592]: **NO Node 'SC-2' left the cluster** Oct 5 13:01:24 SC-1 osaffmd[5526]: NO Node Down event for node id 2020f: Oct 5 13:01:24 SC-1 osaffmd[5526]: NO Current role: ACTIVE Oct 5 13:01:24 SC-1 osaffmd[5526]: **Rebooting OpenSAF NodeId = 131599 EE Name = SC-2, Reason: Received Node Down for peer controller, OwnNodeId = 131343, SupervisionTime = 60 Oct 5 13:01:25 SC-1 external/libvirt[5893]: [5906]: notice: Domain SC-2 was stopped** Oct 5 13:01:27 SC-1 kernel: [ 5355.132093] tipc: Resetting link <1.1.1:eth0-1.1.2:eth0>, peer not responding Oct 5 13:01:27 SC-1 kernel: [ 5355.132123] tipc: Lost link <1.1.1:eth0-1.1.2:eth0> on network plane A Oct 5 13:01:27 SC-1 kernel: [ 5355.132126] tipc: Lost contact with <1.1.2> Oct 5 13:01:27 SC-1 external/libvirt[5893]: [5915]: notice: Domain SC-2 was started Oct 5 13:01:42 SC-1 kernel: [ 5370.557180] tipc: Established link <1.1.1:eth0-1.1.2:eth0> on network plane A Oct 5 13:01:42 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:3, dest:565217457979407) Oct 5 13:01:42 SC-1 osafimmd[5535]: NO New IMMND process is on STANDBY Controller at 2020f Oct 5 13:01:42 SC-1 osafimmd[5535]: WA IMMND on controller (not currently coord) requests sync Oct 5 13:01:42 SC-1 osafimmd[5535]: NO Node 2020f request sync sync-pid:1176 epoch:0 Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO Announce sync, epoch:4 Oct 5 13:01:43 SC-1 osafimmd[5535]: NO Successfully announced sync. New ruling epoch:4 Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_READY --> IMM_SERVER_SYNC_SERVER Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_R_AVAILABLE Oct 5 13:01:43 SC-1 osafimmloadd: NO Sync starting Oct 5 13:01:43 SC-1 osafimmloadd: IN Synced 346 objects in total Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 18430 Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO Epoch set to 4 in ImmModel Oct 5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at node 2010f old epoch: 3 new epoch:4 Oct 5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at node 2020f old epoch: 0 new epoch:4 Oct 5 13:01:43 SC-1 osafimmloadd: NO Sync ending normally Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_SYNC_SERVER --> IMM_SERVER_READY Oct 5 13:01:43 SC-1 osafamfd[5592]: NO Received node_up from 2020f: msg_id 1 Oct 5 13:01:43 SC-1 osafamfd[5592]: NO Node 'SC-2' joined the cluster Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer connected: 16 (MsgQueueService131599) <467, 2010f> Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer locally disconnected. Marking it as doomed 16 <467, 2010f> (MsgQueueService131599) Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer disconnected 16 <467, 2010f> (MsgQueueServi
[tickets] [opensaf:tickets] #2094 Standby controller goes for reboot on stopping openSaf with STONITH enabled cluster
There are two scenarios where "opensafd stop" is invoked on any opensaf controller. SCENARIO-1) Where /etc/init.d/opensafd script is invoked manually on command prompt when the system is running and up. SCENARIO-2) Software on a controller ( other than opensafd) invoked "reboot" for which opensafd stop is invoked in run level 3 or higher. With the patch submitted for #2160, a)node shall go for reboot in scenario-1, if administrator doesn't invoke clm admin operation. This is fine. b) For scenario-2, all run level services shall not be stopped gracefully as the node shall be rebooted abruptly after opensafd stop as admin did not invoke clm admin operation. So, opensafd as a HA software shall not support graceful reboot on standby controller with the #2160 fix ? --- ** [tickets:#2094] Standby controller goes for reboot on stopping openSaf with STONITH enabled cluster** **Status:** unassigned **Milestone:** 5.2.FC **Created:** Wed Oct 05, 2016 07:28 AM UTC by Chani Srivastava **Last Updated:** Wed Nov 02, 2016 11:40 AM UTC **Owner:** nobody OS : Ubuntu 64bit Changeset : 7997 ( 5.1.FC) Setup : 2-node cluster (both controllers) Remote fencing enabled Steps: 1. Bring up OpenSaf on two nodes 2. Enable STONITH 3. Stop opensaf on Standby Active controller triggers reboot of standby SC-1 Syslog Oct 5 13:01:23 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:4, dest:565215202263055) Oct 5 13:01:23 SC-1 osafimmnd[5545]: NO Global discard node received for nodeId:2020f pid:3579 Oct 5 13:01:23 SC-1 osafimmnd[5545]: NO Implementer disconnected 14 <0, 2020f(down)> (@safAmfService2020f) Oct 5 13:01:24 SC-1 osafamfd[5592]: **NO Node 'SC-2' left the cluster** Oct 5 13:01:24 SC-1 osaffmd[5526]: NO Node Down event for node id 2020f: Oct 5 13:01:24 SC-1 osaffmd[5526]: NO Current role: ACTIVE Oct 5 13:01:24 SC-1 osaffmd[5526]: **Rebooting OpenSAF NodeId = 131599 EE Name = SC-2, Reason: Received Node Down for peer controller, OwnNodeId = 131343, SupervisionTime = 60 Oct 5 13:01:25 SC-1 external/libvirt[5893]: [5906]: notice: Domain SC-2 was stopped** Oct 5 13:01:27 SC-1 kernel: [ 5355.132093] tipc: Resetting link <1.1.1:eth0-1.1.2:eth0>, peer not responding Oct 5 13:01:27 SC-1 kernel: [ 5355.132123] tipc: Lost link <1.1.1:eth0-1.1.2:eth0> on network plane A Oct 5 13:01:27 SC-1 kernel: [ 5355.132126] tipc: Lost contact with <1.1.2> Oct 5 13:01:27 SC-1 external/libvirt[5893]: [5915]: notice: Domain SC-2 was started Oct 5 13:01:42 SC-1 kernel: [ 5370.557180] tipc: Established link <1.1.1:eth0-1.1.2:eth0> on network plane A Oct 5 13:01:42 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:3, dest:565217457979407) Oct 5 13:01:42 SC-1 osafimmd[5535]: NO New IMMND process is on STANDBY Controller at 2020f Oct 5 13:01:42 SC-1 osafimmd[5535]: WA IMMND on controller (not currently coord) requests sync Oct 5 13:01:42 SC-1 osafimmd[5535]: NO Node 2020f request sync sync-pid:1176 epoch:0 Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO Announce sync, epoch:4 Oct 5 13:01:43 SC-1 osafimmd[5535]: NO Successfully announced sync. New ruling epoch:4 Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_READY --> IMM_SERVER_SYNC_SERVER Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_R_AVAILABLE Oct 5 13:01:43 SC-1 osafimmloadd: NO Sync starting Oct 5 13:01:43 SC-1 osafimmloadd: IN Synced 346 objects in total Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 18430 Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO Epoch set to 4 in ImmModel Oct 5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at node 2010f old epoch: 3 new epoch:4 Oct 5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at node 2020f old epoch: 0 new epoch:4 Oct 5 13:01:43 SC-1 osafimmloadd: NO Sync ending normally Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_SYNC_SERVER --> IMM_SERVER_READY Oct 5 13:01:43 SC-1 osafamfd[5592]: NO Received node_up from 2020f: msg_id 1 Oct 5 13:01:43 SC-1 osafamfd[5592]: NO Node 'SC-2' joined the cluster Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer connected: 16 (MsgQueueService131599) <467, 2010f> Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer locally disconnected. Marking it as doomed 16 <467, 2010f> (MsgQueueService131599) Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer disconnected 16 <467, 2010f> (MsgQueueService131599) Oct 5 13:01:44 SC-1 osafrded[5518]: NO Peer up on node 0x2020f Oct 5 13:01:44 SC-1 osaffmd[5526]: NO clm init OK Oct 5 13:01:44 SC-1 osafimmd[5535]: NO MDS event from svc_id 24 (change:5, dest:13) Oct 5 13:01:44 SC-1 osaffmd[5526]: NO Peer clm node name: SC-2 Oct 5 13:01:44 SC-1 osafrded[5518]: NO Got peer info request from node 0x2020f with role STANDBY Oct 5 13:01:44 SC-1 osafrded[5518]: NO Got peer info response from node 0x2020f with role STANDBY --- Sent from sourceforge.net because opensaf-tickets@lists.sourcefo
[tickets] [opensaf:tickets] #2151 osaf: system in not in correct state during Act controller comming up
There are three issues in the ticket raised. 1) As per the ticket #2094 comments, "/etc/init.d/opensafd stop" is not a proper way to bring down opensaf. It is suggested that to bring down a faulty node, CLM lock on the node can be performed and later reboot command can be invoked manually. 2) I cannot think of any real use case scenario for "concurrent 'opensafd stop' on controller and opensafd start on another controller". In a fault scenario, reboot -f is called where none of the runlevel services shall be called during node recovery process. So, the scenario of simultaneous 'opensafd stop on SC-1 and opensafd start on SC-2' is not possible in production environment. 3) Deploying such a large number of components on controller is not suggested, as the failure or fault of user components can impact middleware ( opensaf) functionality on the entire cluster. --- ** [tickets:#2151] osaf: system in not in correct state during Act controller comming up** **Status:** unassigned **Milestone:** 5.2.FC **Created:** Mon Oct 31, 2016 10:54 AM UTC by Nagendra Kumar **Last Updated:** Tue Nov 01, 2016 06:59 AM UTC **Owner:** nobody Steps to reproduce: 1. Start two controllers(SC-1 Act, SC-2 Standby) and two paylods. Configure 50 components on SC-2 and unlock them. Keep 1 sec delay in each component stop script. 2. Stop SC-1 and after that, stop SC-2. 3. During SC-2 is going down, start SC-1. Observed behaviour: Since components are taking time in stopping all components during 'opensad stop' of SC-2, Amfnd hasn't exited. But, all middleware components assignments are stopped. Only Amfnd and Amfd is alive with few more components to stop. But SC-1 has come up till Amfd and since two Amfd is Act now, so SC-2 Amfd exits by saying "Duplicate ACTIVE detected, exiting". Till this time, services states including Amfd is in bad state as they couldn't differentiate whether it is headless state or failover. This is true also as the system is in half middle of headless and failover. Expected behaviour In my view: FMS should stop and shouldn't proceed if peer is going down. i.e. FMS should figure out on SC-1 that the peer system is going down. And should allow SC-1 only if all services are down i.e. it gets node down (may be cb->immd_down && cb->immnd_down && cb->amfnd_down && cb->amfd_down && cb->fm_down). --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2052 immtools: SC/PL field in nodes.cfg is not used
Zoran, Node reboot recovery is to be followed, when the system cannot recover from the observed fault. For a fault like amfd crashing, node reboot can be followed. But in the current scenario, upon reboot same configuration exists and node shall go for reboot as opensafd is enabled in the runlevel by default. If the system has the same environment after reboot, then it doesn't help user / system by rebooting to recover from a misconfiguration or even a fault. My expectation is that node shouldn't go for reboot and opensafd should be either running in a suspended way or can even be stopped. This issue is observed mainly for newbies. Rebooting a node upon starting opensaf for misconfiguration doesn't look good. --- ** [tickets:#2052] immtools: SC/PL field in nodes.cfg is not used** **Status:** unassigned **Milestone:** 5.0.2 **Created:** Tue Sep 20, 2016 09:41 AM UTC by Ritu Raj **Last Updated:** Tue Nov 01, 2016 07:26 AM UTC **Owner:** nobody # Environment details OS : Suse 64bit Changeset : 7997 ( 5.1.FC) # Summary Controller able to join with invalid node_name # Steps followed & Observed behaviour 1. Mistakenly configured controller node_name with PL-3 and the remaining configuration files are properly installed and updated apart from /etc/opensaf/node_name. 2. Bringup OpenSAF, OpneSAF still able to comeup with misconfigured node_name Opensaf status: fos1:/opt/goahead/tetware/opensaffire/suites/avsv/api/suites # /etc/init.d/opensafd status safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) # Expected OpenSAF should come up with only SC-1 / SC-2, as immxml generated with : ./immxml-clustersize -s 2 -p 2 ./immxml-configure --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2052 immtools: SC/PL field in nodes.cfg is not used
I think, the discussion got deviated by the usage of PL string in nodes.cfg. On the fist node in the opensaf cluster, the following info is filled up in opensaf cfg files. cat /usr/share/opensaf/immxml/nodes.cfg SC node-1 node-1 SC node-2 node-2 PL node-3 node-3 PL node-4 node-4 PL node-5 node-5 PL node-6 node-6 cat /etc/opensaf/slot_id 1 cat /etc/opensaf/node_name node-3 cat /etc/opensaf/node_type controller -> Opensafd starts successfully, but with the following output safSISU=safSu=node-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) -> After a timegap of 5 minutes, the node went for reboot with the following output. Nov 1 12:31:22 CONTROLLER-1 osaffmd[3945]: Rebooting OpenSAF NodeId = 0 EE Name = No EE Mapped, Reason: Activation timer supervision expired: no ACTIVE assignment received within the time limit, OwnNodeId = 131343, SupervisionTime = 60 Nov 1 12:31:22 CONTROLLER-1 opensaf_reboot: Rebooting local node; timeout=60 Observed behavior : If user mistakenly populates the node_name with the payload's node_name and starts the opensafd script, then user shall not be informed about mis-configuration. The node reboots continuously as opensafd is enabled in runtime by default during RPM installation. Expected behavior : Either fms / imm / amf should detect that the node_name used in bringing up is intended for payload, but not for controller. More importantly, the node should not go for reboot. --- ** [tickets:#2052] immtools: SC/PL field in nodes.cfg is not used** **Status:** unassigned **Milestone:** 5.0.2 **Created:** Tue Sep 20, 2016 09:41 AM UTC by Ritu Raj **Last Updated:** Tue Sep 20, 2016 05:49 PM UTC **Owner:** nobody # Environment details OS : Suse 64bit Changeset : 7997 ( 5.1.FC) # Summary Controller able to join with invalid node_name # Steps followed & Observed behaviour 1. Mistakenly configured controller node_name with PL-3 and the remaining configuration files are properly installed and updated apart from /etc/opensaf/node_name. 2. Bringup OpenSAF, OpneSAF still able to comeup with misconfigured node_name Opensaf status: fos1:/opt/goahead/tetware/opensaffire/suites/avsv/api/suites # /etc/init.d/opensafd status safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) # Expected OpenSAF should come up with only SC-1 / SC-2, as immxml generated with : ./immxml-clustersize -s 2 -p 2 ./immxml-configure --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1765 ckpt : saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after couple of failover
Apart from ERR_LIBRARY return value, CKPT open fails with ERR_NO_RESOURCES randomly after failover. --- ** [tickets:#1765] ckpt : saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after couple of failover** **Status:** accepted **Milestone:** 5.0.2 **Created:** Fri Apr 15, 2016 06:26 AM UTC by Ritu Raj **Last Updated:** Tue Sep 20, 2016 06:04 PM UTC **Owner:** Pham Hoang Nhat **Attachments:** - [ckpt_trace.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1765/attachment/ckpt_trace.tar.bz2) (3.2 MB; application/x-bzip) setup: Changeset- 7436 Version - opensaf 5.0 FC 4 nodes configured with single PBE and a load of 30K objects * Issue observed : saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after couple of failover * Steps to reproduce: > Ran couple of failover and observed saCkptCheckpointOpen failed. > below is the snippet of agent trace: Apr 15 8:08:50.275115 cpa [28883:cpa_mds.c:0776] << cpa_mds_msg_sync_send: retval = 1 Apr 15 8:08:50.275128 cpa [28883:cpa_api.c:1043] T4 Cpa CkptOpen failed with return value:2,ckptHandle:63 Apr 15 8:08:50.275141 cpa [28883:cpa_api.c:1146] << **saCkptCheckpointOpen: API return code = 2** > Traces of both controllers and agent trace of payload is attached. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2110 AMF : amfd aborted on both controllers after opensafd stopped on payload
--- ** [tickets:#2110] AMF : amfd aborted on both controllers after opensafd stopped on payload** **Status:** unassigned **Milestone:** 5.2.FC **Created:** Tue Oct 11, 2016 05:35 AM UTC by Srikanth R **Last Updated:** Tue Oct 11, 2016 05:35 AM UTC **Owner:** nobody Changeset : 5.1GA 8190 Setup : 4 nodes setup with PBE enabled ( 1 lakh objects) and headless feature enabled . Steps performed : -> Brought up opensaf on 4 node setup -> Ran IMM test application on Oct 8th and also performed middleware failovers. -> For two days, setup is left idle. -> On Oct 10 14:07:38, stopped opensaf on PL-4 for which amfd on both controllers aborted Oct 10 14:07:38 SLES-SLOT1 osafimmnd[2748]: NO Global discard node received for nodeId:2040f pid:3261 Oct 10 14:07:38 SLES-SLOT1 osafamfd[2788]: NO Node 'PL-4' left the cluster Oct 10 14:07:38 SLES-SLOT1 osafamfd[2788]: su.cc:2006: dec_curr_act_si: Assertion 'saAmfSUNumCurrActiveSIs > 0' failed. Oct 10 14:07:38 SLES-SLOT1 osafamfnd[2798]: WA AMF director unexpectedly crashed Below is the back trace : 2 0x7f7426025197 in __osafassert_fail (__file=0x51b4ed "su.cc", __line=2006, __func=0x51ce30 "dec_curr_act_si", __assertion=0x51c884 "saAmfSUNumCurrActiveSIs > 0") at sysf_def.c:281 3 0x004de88c in AVD_SU::dec_curr_act_si (this=0x7bde40) at su.cc:2006 4 0x004c504e in avd_susi_delete (cb=0x75dba0 <_control_block>, susi=0x7eb940, ckpt=false) at siass.cc:554 5 0x0049a326 in SG_NORED::node_fail (this=0x7bc210, cb=0x75dba0 <_control_block>, su=0x7bde40) at sg_nored_fsm.cc:781 6 0x004bd4d7 in avd_node_down_mw_susi_failover (cb=0x75dba0 <_control_block>, avnd=0x7b04d0) at sgproc.cc:1983 7 0x00461a77 in avd_node_failover (node=0x7b04d0) at ndproc.cc:1142 8 0x00459d63 in avd_mds_avnd_down_evh (cb=0x75dba0 <_control_block>, evt=0x7f741c002270) at ndfsm.cc:684 9 0x00453f60 in process_event (cb_now=0x75dba0 <_control_block>, evt=0x7f741c002270) at main.cc:775 10 0x00453c83 in main_loop () at main.cc:696 11 0x004541ff in main (argc=2, argv=0x7fffedc7f828) at main.cc:848 Below is the amfnd trace : Oct 10 14:07:38.712919 osafamfd [2788:imm.cc:1751] << avd_saImmOiRtObjectDelete Oct 10 14:07:38.712922 osafamfd [2788:csi.cc:1292] << avd_compcsi_delete Oct 10 14:07:38.712925 osafamfd [2788:mbcsv_api.c:0773] >> mbcsv_process_snd_ckpt_request: Sending checkpoint data to all STANDBY peers, as per the send-type specified Oct 10 14:07:38.712928 osafamfd [2788:mbcsv_api.c:0803] TR svc_id:10, pwe_hdl:65537 Oct 10 14:07:38.712931 osafamfd [2788:mbcsv_util.c:0343] >> mbcsv_send_ckpt_data_to_all_peers Oct 10 14:07:38.712934 osafamfd [2788:mbcsv_util.c:0387] TR dispatching FSM for NCSMBCSV_SEND_ASYNC_UPDATE Oct 10 14:07:38.712936 osafamfd [2788:mbcsv_act.c:0101] TR ASYNC update to be sent. role: 1, svc_id: 10, pwe_hdl: 65537 Oct 10 14:07:38.712939 osafamfd [2788:mbcsv_util.c:0399] TR calling encode callback Oct 10 14:07:38.712942 osafamfd [2788:chkop.cc:0228] TR Async update Oct 10 14:07:38.712945 osafamfd [2788:ckpt_enc.cc:0681] >> enc_siass: io_action '2' Oct 10 14:07:38.712998 osafamfd [2788:ckpt_enc.cc:0704] << enc_siass Oct 10 14:07:38.713001 osafamfd [2788:mbcsv_util.c:0438] TR send the encoded message to any other peer with same s/w version Oct 10 14:07:38.713004 osafamfd [2788:mbcsv_util.c:0441] TR dispatching FSM for NCSMBCSV_SEND_ASYNC_UPDATE Oct 10 14:07:38.713006 osafamfd [2788:mbcsv_act.c:0101] TR ASYNC update to be sent. role: 1, svc_id: 10, pwe_hdl: 65537 Oct 10 14:07:38.713009 osafamfd [2788:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: sending to vdest:1 Oct 10 14:07:38.713012 osafamfd [2788:mbcsv_mds.c:0201] TR send type MDS_SENDTYPE_RED Oct 10 14:07:38.713023 osafamfd [2788:mbcsv_mds.c:0244] << mbcsv_mds_send_msg: success Oct 10 14:07:38.713027 osafamfd [2788:mbcsv_util.c:0492] << mbcsv_send_ckpt_data_to_all_peers Oct 10 14:07:38.713030 osafamfd [2788:mbcsv_api.c:0868] << mbcsv_process_snd_ckpt_request: retval: 1 Oct 10 14:07:38.713033 osafamfd [2788:siass.cc:0496] >> avd_susi_delete: safSu=PL-4,safSg=NoRed,safApp=OpenSAF safSi=NoRed4,safApp=OpenSAF Oct 10 14:09:23.708873 osafamfd [2802:main.cc:0500] >> initialize --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2106 amf: Admin Operations on middleware SUs / SIs should not be supported
--- ** [tickets:#2106] amf: Admin Operations on middleware SUs / SIs should not be supported** **Status:** unassigned **Milestone:** 5.2.FC **Created:** Sun Oct 09, 2016 11:18 AM UTC by Srikanth R **Last Updated:** Sun Oct 09, 2016 11:18 AM UTC **Owner:** nobody Changeset : 8190 5.1.GA -> Bring up a single controller SC-1 -> Now perform lock and unlock operation of middleware SU .i.e safSu=SC-2,safSg=NoRed,safApp=OpenSAF which is hosted on SC-2. -> Admin lock operation succeeds, but admin unlock operation times out with the assignment to one of middleware SI. Following is the opensafd status after the unlock operation. safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) safSISU=safSu=SC-2\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) Admin operations on middleware objects should not be supported. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2105 AMF : SG is unstable, if app responds during node link loss detection time period
--- ** [tickets:#2105] AMF : SG is unstable, if app responds during node link loss detection time period** **Status:** unassigned **Milestone:** 5.2.FC **Created:** Sun Oct 09, 2016 07:12 AM UTC by Srikanth R **Last Updated:** Sun Oct 09, 2016 07:12 AM UTC **Owner:** nobody Setup : Changeset : 8190 5 node SLES setup with 2 controllers and 3 payloads ( TIPC -- headless enabled) 2n application deployed on 2 payloads. Issue : -> Perform admin operation on an AMF enity. -> Do not respond to the callback and invoke headless scenario. -> On a VM with TIPC setup, 3 seconds is taken to detect the node down. -> If the application responds to a callback in admin operation during this time period when the last controller is down, the message shall not reach any controller. Amfnd on payload shall send the "Assigned" message but not store this message. For this scenario, SG shall move to unstable state. Below is the snippet from syslog, where application responded at 15:48:28 and at 15:48:31 payloads detected that last controller is down. Oct 7 15:48:28 SYSTEST-PLD-1 osafamfnd[9976]: NO Assigned 'safSi=TestApp_SI1,safApp=TestApp_TwoN' ACTIVE to 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 7 15:48:31 SYSTEST-PLD-1 osafamfnd[9976]: WA AMF director unexpectedly crashed Oct 7 15:48:31 SYSTEST-PLD-1 osafamfnd[9976]: NO Checking 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' for pending messages Oct 7 15:48:31 SYSTEST-PLD-1 osafamfnd[9976]: NO Checking 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' for pending messages Oct 7 15:48:31 SYSTEST-PLD-1 osafimmnd[9957]: WA SC Absence IS allowed:900 IMMD service is DOWN Oct 7 15:48:31 SYSTEST-PLD-1 osafimmnd[9957]: NO IMMD SERVICE IS DOWN, HYDRA IS CONFIGURED => UNREGISTERING IMMND form MDS -> Below is the scenario, when payload detected that there is no controller at 18:31:34 and amfnd shall call avnd_di_susi_resp_send after the controllers join back the cluster. Application responded at 18:31:41. Oct 7 18:31:34 SYSTEST-PLD-1 osafimmnd[12448]: WA SC Absence IS allowed:900 IMMD service is DOWN Oct 7 18:31:34 SYSTEST-PLD-1 osafimmnd[12448]: NO IMMD SERVICE IS DOWN, HYDRA IS CONFIGURED => UNREGISTERING IMMND form MDS Oct 7 18:31:41 SYSTEST-PLD-1 osafamfnd[12467]: NO Assigned 'safSi=TestApp_SI4,safApp=TestApp_TwoN' ACTIVE to 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 7 18:31:41 SYSTEST-PLD-1 osafamfnd[12467]: NO avnd_di_susi_resp_send() deferred as AMF director is offline --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2100 Standby should not be rebooted, for SC absence configuration mismatch
--- ** [tickets:#2100] Standby should not be rebooted, for SC absence configuration mismatch** **Status:** unassigned **Milestone:** 5.2.FC **Created:** Fri Oct 07, 2016 07:11 AM UTC by Srikanth R **Last Updated:** Fri Oct 07, 2016 07:11 AM UTC **Owner:** nobody Changeset : 8190 5.1.GA -> Initially brought up opensaf on SC-1 with "SC ABSENCE" feature enabled in immd.conf. -> On SC-2, "SC ABSENCE" feature is not enabled in immd.conf and opensafd is started on SC-2, for which node rebooted. Oct 7 17:58:27 SLES-SLOT2 osafimmd[3615]: ER SC absence allowed in not the same as on active IMMD. Active: 900, Standby: 0. Exiting. Oct 7 17:58:27 SLES-SLOT2 osafamfnd[3676]: NO 'safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' Oct 7 17:58:27 SLES-SLOT2 osafamfnd[3676]: ER safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast Oct 7 17:58:27 SLES-SLOT2 osafamfnd[3676]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131599, SupervisionTime = 60 Here user had misconfigured the configuration on both the controllers, for which standby rebooted. Opensafd is enabled in runlevel as part of installation and standby shall reboot continuously until opensafd is stopped on SC-1. Suggested behavior : Opensafd should not start on standby, instead of immediate reboot. Also, the cluster level attributes like IMMSV_SC_ABSENCE_ALLOWED, can be moved to imm.xml. Node level attributes like traces enabling can be retained in configuration files. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2096 AMF : SG in unstable state for fault in component during admin unlock (headless)
--- ** [tickets:#2096] AMF : SG in unstable state for fault in component during admin unlock (headless)** **Status:** unassigned **Milestone:** 5.2.FC **Created:** Wed Oct 05, 2016 08:08 AM UTC by Srikanth R **Last Updated:** Wed Oct 05, 2016 08:08 AM UTC **Owner:** nobody **Attachments:** - [2096.tgz](https://sourceforge.net/p/opensaf/tickets/2096/attachment/2096.tgz) (4.6 MB; application/x-compressed-tar) Environment : - Changeset: 7997 5.1.FC Setup : 5 nodes setup with 2 controllers and headless feature enabled and PBE disabled. Application : 2N application with 2 SUs and 4 SIs with out si-si deps. Steps performed : -- SG moved to unstable state for fault in component when admin unlock operation is performed on SG and headless state is invoked. Below are the steps performed. -> The application is brought up initially and the SIs are fully assigned. -> Now performed lock,lock-in , unlock-in and unlock operation performed on SG with the sufficient time gap. -> During unlock operation of SG, component 2 of SU1 did not respond to the active assignment, headless scenario is invoked. 3148 12:34:05 10/05/2016 NO safApp=safAmfService "Admin op "UNLOCK" initiated for 'safSg=TestApp_SG1,safApp=TestApp_TwoN', invocation: 1683627180042" 3149 12:34:05 10/05/2016 NO safApp=safAmfService "safSg=TestApp_SG1,safApp=TestApp_TwoN AdmState LOCKED => UNLOCKED" -> After headless state is achieved, component2 faulted with csi set callback timeout. Oct 5 12:34:33 SYSTEST-PLD-1 osafamfnd[2626]: NO 'safComp=COMP2,safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' faulted due to 'csiSetcallbackTimeout' : Recovery is 'componentRestart' -> After controllers joined back the cluster, SU2 did not get any assignments. --> Further operations on SG resulted in UNSTABLE state. 3202 12:40:59 10/05/2016 NO safApp=safAmfService "Admin op "LOCK" initiated for 'safSg=TestApp_SG1,safApp=TestApp_TwoN', invocation: 1696512081921" 3203 12:40:59 10/05/2016 NO safApp=safAmfService "Admin op invocation: 1696512081921, err: 'SG not in STABLE state (safSg=TestApp_SG1,safApp=TestApp_TwoN)'" 3204 12:40:59 10/05/2016 NO safApp=safAmfService "Admin op done for invocation: 1696512081921, result 6" Logs : The traces of SC-1 ( active controller before headless and after headless ) and PL-3 ( SU1 hosted) are attached. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2088 CLM : saClmClusterNodeGetAsync returns OK on a non member node
--- ** [tickets:#2088] CLM : saClmClusterNodeGetAsync returns OK on a non member node** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Mon Oct 03, 2016 07:34 AM UTC by Srikanth R **Last Updated:** Mon Oct 03, 2016 07:34 AM UTC **Owner:** nobody Changeset : 7997 5.1.FC The saClmClusterNodeGetAsync api returns SA_AIS_OK on a non member node. The expected behavior is saClmClusterNodeGetAsync, should return ERR_AVAILABLE like the api saClmClusterNodeGet_4. Currently the saClmClusterNodeGet_4 api returns ERR_AVAILABLE on a nonmember node. Below is the snippet from CLM agent trace. Oct 3 12:40:34.532320 clma [7881:clma_api.c:1235] >> saClmClusterNodeGet_4 Oct 3 12:40:34.532330 clma [7881:clma_api.c:1278] >> clmaclusternodeget Oct 3 12:40:34.532338 clma [7881:clma_util.c:0036] >> clma_validate_version Oct 3 12:40:34.532345 clma [7881:clma_util.c:0042] << clma_validate_version Oct 3 12:40:34.532363 clma [7881:clma_mds.c:1227] >> clma_mds_msg_sync_send Oct 3 12:40:34.532383 clma [7881:clma_mds.c:0317] >> clma_mds_enc Oct 3 12:40:34.532392 clma [7881:clma_mds.c:0352] T2 msgtype: 0 Oct 3 12:40:34.532399 clma [7881:clma_mds.c:0366] T2 api_info.type: 4 Oct 3 12:40:34.532406 clma [7881:clma_mds.c:0192] >> clma_enc_node_get_msg Oct 3 12:40:34.532412 clma [7881:clma_mds.c:0207] << clma_enc_node_get_msg Oct 3 12:40:34.532418 clma [7881:clma_mds.c:0407] << clma_mds_enc Oct 3 12:40:34.533347 clma [7881:clma_mds.c:0697] >> clma_mds_dec Oct 3 12:40:34.533377 clma [7881:clma_mds.c:0729] T2 CLMSV_CLMA_API_RESP_MSG rc = 31 Oct 3 12:40:34.533388 clma [7881:clma_mds.c:0809] << clma_mds_dec Oct 3 12:40:34.533448 clma [7881:clma_mds.c:1253] << clma_mds_msg_sync_send Oct 3 12:40:34.533474 clma [7881:clma_util.c:0656] >> clma_msg_destroy Oct 3 12:40:34.533486 clma [7881:clma_util.c:0694] << clma_msg_destroy Oct 3 12:40:34.533496 clma [7881:clma_api.c:1395] << clmaclusternodeget Oct 3 12:40:34.533502 clma [7881:clma_api.c:1245] << saClmClusterNodeGet_4 Oct 3 12:40:34.533657 clma [7881:clma_api.c:1422] >> saClmClusterNodeGetAsync Oct 3 12:40:34.533668 clma [7881:clma_util.c:0036] >> clma_validate_version Oct 3 12:40:34.533674 clma [7881:clma_util.c:0042] << clma_validate_version Oct 3 12:40:34.533681 clma [7881:clma_mds.c:1274] >> clma_mds_msg_async_send Oct 3 12:40:34.533692 clma [7881:clma_mds.c:0317] >> clma_mds_enc Oct 3 12:40:34.533700 clma [7881:clma_mds.c:0352] T2 msgtype: 0 Oct 3 12:40:34.533707 clma [7881:clma_mds.c:0366] T2 api_info.type: 5 Oct 3 12:40:34.533713 clma [7881:clma_mds.c:0229] >> clma_enc_node_get_async_msg Oct 3 12:40:34.533720 clma [7881:clma_mds.c:0245] << clma_enc_node_get_async_msg Oct 3 12:40:34.533726 clma [7881:clma_mds.c:0407] << clma_mds_enc Oct 3 12:40:34.533744 clma [7881:clma_mds.c:1296] << clma_mds_msg_async_send Oct 3 12:40:34.533753 clma [7881:clma_api.c:1497] << saClmClusterNodeGetAsync --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2086 LCK : Lock waiter callbacks are not invoked after glnd restart
--- ** [tickets:#2086] LCK : Lock waiter callbacks are not invoked after glnd restart** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Fri Sep 30, 2016 06:10 AM UTC by Srikanth R **Last Updated:** Fri Sep 30, 2016 06:10 AM UTC **Owner:** nobody Changeset : 7997 5.1.FC Lock waiter callbacks are not invoked after glnd restart. Below are the steps performed as part of application. -> Initialize with LCK and store as handle 1. -> Initialize with LCK and store as handle 2 in another thread. -> Open a lock using saLckResourceOpen with handle1 -> Open the same lock using saLckResourceOpen with handle2 -> With handle 2, request lock in PR mode using saLckResourceOpen api. Call this api 5 times. -> WIth handle 1, request lock in EX mode, As the handle 2 has requested the lock 5 times, the thread for handle 1 should get 5 lock waiter callbacks. Some times, lock waiter callback is not invoked for the thread using handle 1. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2085 CKPT : IMM attributes for ckpt table are increased by 1, when ckpt open returns TIME_OUT
--- ** [tickets:#2085] CKPT : IMM attributes for ckpt table are increased by 1, when ckpt open returns TIME_OUT** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Fri Sep 30, 2016 05:13 AM UTC by Srikanth R **Last Updated:** Fri Sep 30, 2016 05:13 AM UTC **Owner:** nobody Changeset : 7997 5.1.FC IMM attributes for ckpt table are increased by 1, when ckpt open returns TIME_OUT. Below is the flow of steps in which how application uses CKPT. -> Initialize with ckpt with callbacks. API returned SA_AIS_OK -> Invoke selection object. API returned SA_AIS_OK -> Create a checkpoint using async option. API returned SA_AIS_OK -> Kill ckpnd process. -> Check for the callbacks and check the IMM attribute of CKPT object. Callback is invoked, in which return value is ERR_TIMEOUT. Spec mandates that, api should be called again to check whether checkpoint creation is successful or not. If the further call returns ERR_EXIST, the previous call is successful or the further call returns SA_AIS_OK, the previous call is unsuccessful. -> As the callback returned SA_AIS_ERR_TIMEOUT, invoked the checkpoint creation api async again. This time, api and both callback returned SA_AIS_OK. Now if you check the attributes for CKPT table object, the attributes saCkptCheckpointNumOpeners, saCkptCheckpointNumReaders and saCkptCheckpointNumWriters are having a value of 2, instead of expected value 1. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2082 CKPT : Track cbk not invoked for section creation after cpnd restart
--- ** [tickets:#2082] CKPT : Track cbk not invoked for section creation after cpnd restart** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Thu Sep 29, 2016 11:06 AM UTC by Srikanth R **Last Updated:** Thu Sep 29, 2016 11:06 AM UTC **Owner:** nobody Changeset: 7997 5.1.FC Track Callback is not invoked after cpnd restart. Below are the apis called from the applications , spawned on two nodes .i.e payloads. On first node : -> Initialize with cpsv -> Create a ckpt with ACTIVE REPLICA flag. On second node. -> Initialize with cpsv. On First node, -> Open the checkpoint in writing mode -> Open the checkpoint in reading mode. -> Kill cpnd process -> Register for Track callback. On Second node, -> Open the ckpt in read mode. -> Kill cpnd proces -> Register for Track callback. After ensuring that both agents registered for track callback, create a section from the application on first node. For section creation, callback should be invoked for applications on two nodes. Currently callback is not invoked for the application on second node. With out cpnd restart, callback is invoked for the two applications --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2075 LongDnAllowed attribute should be defined in the imm.xml
--- ** [tickets:#2075] LongDnAllowed attribute should be defined in the imm.xml** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Tue Sep 27, 2016 12:32 PM UTC by Srikanth R **Last Updated:** Tue Sep 27, 2016 12:32 PM UTC **Owner:** nobody Observed behaviour -- With 5.1 , all the active services are integrated with Long DN feature. To enable the long Dn feature, user need to modify the attribute "longDnsAllowed" for the object opensafImm=opensafImm,safApp=safImmService. The steps about, how to enable the long dn object are mentioned in the PR doc. But the object is not defined in imm.xml. LongDn feature shall be enabled if controllers use imm.db or dumped imm.xml where the attribute is already set , but not using generated imm.xml. Suggested behaviour User should be given option to enable long dn feature in the initial startup either by including the attribute in initial generated imm.xml or environmental variable in any of opensaf configuration file. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2074 amfd asserted on rebooted controllers continuoulsy after split brain scenario (headless)
--- ** [tickets:#2074] amfd asserted on rebooted controllers continuoulsy after split brain scenario (headless)** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Tue Sep 27, 2016 12:14 PM UTC by Srikanth R **Last Updated:** Tue Sep 27, 2016 12:14 PM UTC **Owner:** nobody Setup : SLES 11 Physical machine Changeset :7997 5.1 FC 2 controllers and 2 payloads with headless feature enabled. 2N application with 3 SUs. (AmfDemo). Issue : amfd asserted on controllers continuoulsy for every reboot after initial split brain scenario is observed Steps performed : -> Initially brought up four nodes and all the nodes joined the cluster. -> Brought up the 2N application, with SUs hosted on SC-1 ,SC-2 and PL-3 successfully. -> Performed some operations on the AMF objects and the cluster is left in idle state later. -> After a gap of 2 weeks, MDS down event is generated on both the controllers for which spilt brain scenario is generated. Because of momentary cable(s) unplugging, MDS down event is generated. Sep 24 21:36:40 SLES-SLOT1 osafimmd[2729]: NO MDS event from svc_id 25 (change:3, dest:565214187380752) Sep 24 21:36:40 SLES-SLOT1 kernel: [1297950.833811] TIPC: Established link <1.1.1:em1-1.1.2:em1> on network plane A Sep 24 21:36:40 SLES-SLOT1 osafrded[2710]: Rebooting OpenSAF NodeId = 0 EE Name = No EE Mapped, Reason: Split-brain detected, OwnNodeId = 131343, SupervisionTime = 60 Sep 26 00:00:01 SLES-SLOT2 osafrded[2715]: NO Got peer info request from node 0x2010f with role ACTIVE Sep 26 00:00:01 SLES-SLOT2 osafrded[2715]: Rebooting OpenSAF NodeId = 0 EE Name = No EE Mapped, Reason: Split-brain detected, OwnNodeId = 131599, SupervisionTime = 60 -> As headless feature is enabled, payloads did not go for reboot. -> Once controllers joined the payloads, amfd asserted on the rebooted controller and controllers went for reboot. Sep 24 21:39:27 SLES-SLOT1 osafamfd[2772]: NO Received node_up from 2010f: msg_id 1 Sep 24 21:39:27 SLES-SLOT1 osafamfd[2772]: siass.cc:953: avd_susi_recreate: Assertion 'su' failed. Sep 24 21:39:27 SLES-SLOT1 osafamfnd[2782]: WA AMF director unexpectedly crashed Sep 24 21:39:27 SLES-SLOT1 osafamfnd[2782]: WA AMF director unexpectedly crashed Sep 24 21:39:27 SLES-SLOT1 osafamfnd[2782]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, OwnNodeId = 131343, SupervisionTime = 60 Below is the backtrace : #0 0x7f1d28510b55 in raise () from /lib64/libc.so.6 No symbol table info available. #1 0x7f1d28512131 in abort () from /lib64/libc.so.6 No symbol table info available. #2 0x7f1d2a397197 in __osafassert_fail (__file=0x517c15 "siass.cc", __line=953, __func=0x518250 "avd_susi_recreate", __assertion=0x517d01 "su") at sysf_def.c:281 No locals. #3 0x004c56a5 in avd_susi_recreate (info=0x7f1d20008ec8) at siass.cc:953 su = 0x0 __FUNCTION__ = "avd_susi_recreate" susi = 0x0 node = 0x7bfdf0 susi_state = 0x0 su_state = 0x7f1d200055a0 __PRETTY_FUNCTION__ = "SaAisErrorT avd_susi_recreate(AVSV_N2D_ND_SISU_STATE_MSG_INFO*)" #4 0x00459943 in avd_process_state_info_queue (cb=0x75cba0 <_control_block>) at ndfsm.cc:78 n2d_msg = 0x7f1d20008ec0 i = 0 queue_size = 4 queue_evt = 0x7a9b60 act_amfnd_node_up_count = 1 found_state_info = true __FUNCTION__ = "avd_process_state_info_queue" #5 0x0045a50f in avd_node_up_evh (cb=0x75cba0 <_control_block>, evt=0x7f1d20008880) at ndfsm.cc:363 avnd = 0x7bf380 n2d_msg = 0x7f1d20004b30 rc = 1 sync_nd_size = 4 act_nd = true __FUNCTION__ = "avd_node_up_evh" #6 0x00453d78 in process_event (cb_now=0x75cba0 <_control_block>, evt=0x7f1d20008880) at main.cc:768 __FUNCTION__ = "process_event" #7 0x00453a9b in main_loop () at main.cc:689 pollretval = 1 cb = 0x75cba0 <_control_block> evt = 0x7f1d20008880 mbx_fd = {raise_obj = 11, rmv_obj = 12} error = SA_AIS_OK polltmo = -1 term_fd = 17 __FUNCTION__ = "main_loop" #8 0x00454017 in main (argc=2, argv=0x7fff50cd9958) at main.cc:841 Suggested recovery : During a split brain scenario, payloads should be ordered for reboot even in headless feature. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- __
[tickets] [opensaf:tickets] #2070 LCK : IMM attrib update issues for LCK application objects.
--- ** [tickets:#2070] LCK : IMM attrib update issues for LCK application objects.** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Tue Sep 27, 2016 07:12 AM UTC by Srikanth R **Last Updated:** Tue Sep 27, 2016 07:12 AM UTC **Owner:** nobody Following are the two scenarios, in which IMM attributes for a LCK object are not properly updated. SCENARIO - 1 : -> Invoke saLckInitialize -> Invoke saLckResourceOpen with SA_LCK_RESOURCE_CREATE flag for the resource "resource1_101". -> Invoke saLckResourceOpenAsync with SA_LCK_RESOURCE_CREATE flag for the earlier resource -> Invoke saLckResourceLock with SA_LCK_LOCK_ORPHAN flag in PR mode. -> Invoke saLckResourceLock with SA_LCK_LOCK_ORPHAN flag in PR mode. -> Invoke saLckFinalize. Now that agent invoked Finalize, the stripped count should be zero for the LCK object "safLock=resource1_101". But the saLckResourceStrippedCount value is 2, SCENARIO - 2 : On node 1 : ->Invoke saLckInitialize. -> Invoke saLckResourceOpen with SA_LCK_RESOURCE_CREATE flag On node 2 : -> Invoke saLckInitialize. -> Invoke saLckResourceOpen for the same resource as on node1. -> Invoke saLckResourceLock in PR mode. -> Invoke saLckResourceLock with SA_LCK_LOCK_ORPHAN flag. -> Invoke saLckFinalize. Now on Node 1, if the saLckResourceNumOpeners attribute is retrieved for the resource where the application is still running then the expected value is 1. But 0 is being populated. And the saLckResourceIsOrphaned. attribute value is expected to be 1, but it is set to 0. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1801 lck: saLckResourceOpen returns SA_AIS_ERR_TIMEOUT / SA_AIS_ERR_LIBRARY after failovers / switchovers.
- **summary**: lck: saLckResourceOpen with flag SA_LCK_RESOURCE_CREATE returning SA_AIS_ERR_TIMEOUT after 5 failovers. --> lck: saLckResourceOpen returns SA_AIS_ERR_TIMEOUT / SA_AIS_ERR_LIBRARY after failovers / switchovers. - **Comment**: After couple of switchovers / failovers, saLckResourceOpen may fail randomly with following return values. -> SA_AIS_ERR_TIMEOUT -> SA_AIS_ERR_LIBRARY -> random return values , which is out of bound --- ** [tickets:#1801] lck: saLckResourceOpen returns SA_AIS_ERR_TIMEOUT / SA_AIS_ERR_LIBRARY after failovers / switchovers.** **Status:** unassigned **Milestone:** 5.0.2 **Created:** Mon May 02, 2016 09:52 AM UTC by Madhurika Koppula **Last Updated:** Tue Sep 20, 2016 06:04 PM UTC **Owner:** nobody **Attachments:** - [glsv.tgz](https://sourceforge.net/p/opensaf/tickets/1801/attachment/glsv.tgz) (3.0 MB; application/octet-stream) Setup: Changeset- 7436 OS: Oracle Linux Server release 6.4 (x86_64) 4 nodes configured with single PBE some failover tests are being ran. safLock=resource1_101 object is not getting deleted. Thereby saLckResourceOpen with flag SA_LCK_RESOURCE_CREATE is continuously returning SA_AIS_ERR_TIMEOUT. With sleep of 10secs, 15times retry is done on the same API call. Snippet from the run: 100|7| SUCCESS : saLckInitialize with valid parameters 100|7| Return Value: SA_AIS_OK 100|7| LckHandle : 6599312 100|7| 100|7| 100|7| SUCCESS : saLckInitialize with valid parameters 100|7| Return Value: SA_AIS_OK 100|7| LckHandle : 6599392 100|7| 100|7| 100|7| Resource Name : safLock=resource1_101 100|7| open flags : SA_LCK_RESOURCE_CREATE 100|7| FAILED : saLckResourceOpen with valid parameters 100|7| Return Value: SA_AIS_ERR_TIMEOUT 100|7| Resource Name : safLock=resource1_101 100|7| open flags : SA_LCK_RESOURCE_CREATE 100|7| 100|7| Resource Name : safLock=resource1_101 100|7| open flags : SA_LCK_RESOURCE_CREATE 100|7| Resource Name : safLock=resource1_101 100|7| open flags : SA_LCK_RESOURCE_CREATE 100|7| 100|7| Resource Name : safLock=resource1_101 100|7| open flags : SA_LCK_RESOURCE_CREATE 100|7| 100|7| Resource Name : safLock=resource1_101 100|7| open flags : SA_LCK_RESOURCE_CREATE 100|7| 100|7| Resource Name : safLock=resource1_101 100|7| open flags : SA_LCK_RESOURCE_CREATE Timeout count exceeded: 15 Timestamp of the Active controller at this instant: May 2 14:22:56 OEL_M-SLOT-2 root: killing osafimmd from run_failover.sh May 2 14:22:56 OEL_M-SLOT-2 osafamfnd[1755]: NO 'safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' May 2 14:22:56 OEL_M-SLOT-2 osafamfnd[1755]: ER safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast May 2 14:22:56 OEL_M-SLOT-2 osafamfnd[1755]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131599, SupervisionTime = 60 May 2 14:22:56 OEL_M-SLOT-2 opensaf_reboot: Rebooting local node; timeout=60 Timestamp of the Standby controller which is becoming active after failover: May 2 14:23:00 OEL_M-SLOT-1 opensaf_reboot: Rebooting remote node in the absence of PLM is outside the scope of OpenSAF May 2 14:23:00 OEL_M-SLOT-1 osaffmd[1677]: NO Controller Failover: Setting role to ACTIVE May 2 14:23:00 OEL_M-SLOT-1 osafrded[1667]: NO RDE role set to ACTIVE May 2 14:23:00 OEL_M-SLOT-1 osafrded[1667]: NO Running '/usr/lib64/opensaf/opensaf_sc_active' with 0 argument(s) May 2 14:23:00 OEL_M-SLOT-1 osafimmd[1688]: NO ACTIVE request May 2 14:23:00 OEL_M-SLOT-1 osaflogd[1711]: NO ACTIVE request May 2 14:23:00 OEL_M-SLOT-1 osafntfd[1722]: NO ACTIVE request May 2 14:23:00 OEL_M-SLOT-1 osafclmd[1733]: NO ACTIVE request May 2 14:23:00 OEL_M-SLOT-1 osafamfd[1744]: NO FAILOVER StandBy --> Active /var/log/messages and osaflckd traces of both controllers are attached. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2067 EVT : saEvtEventPublish returns BAD_HANDLE, during middleware si-swap operation
- **summary**: EVT : Api returns BAD_HANDLE, during middleware si-swap operation --> EVT : saEvtEventPublish returns BAD_HANDLE, during middleware si-swap operation - Description has changed: Diff: --- old +++ new @@ -2,9 +2,7 @@ Setup : 2 controllers and 2 payloads with headless feature disabled. - Evt api returns SA_AIS_ERR_BAD_HANDLE during middleware si-swap operation. - -In the below scenario, saEvtEventPublish returns BAD_HANDLE. The api is called, after invoking middleware switchover. + The saEvtEventPublish api is called with proper handle, after just invoking middleware switchover. The api returned SA_AIS_ERR_BAD_HANDLE. Sep 26 17:42:10.861357 imma [11005:eda_saf_api.c:0320] >> saEvtDispatch: event handle: ff84 Sep 26 17:42:10.861494 imma [11005:eda_saf_api.c:0363] << saEvtDispatch --- ** [tickets:#2067] EVT : saEvtEventPublish returns BAD_HANDLE, during middleware si-swap operation** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Mon Sep 26, 2016 12:44 PM UTC by Srikanth R **Last Updated:** Mon Sep 26, 2016 12:44 PM UTC **Owner:** nobody Changeset : 7997 5.1.FC Setup : 2 controllers and 2 payloads with headless feature disabled. The saEvtEventPublish api is called with proper handle, after just invoking middleware switchover. The api returned SA_AIS_ERR_BAD_HANDLE. Sep 26 17:42:10.861357 imma [11005:eda_saf_api.c:0320] >> saEvtDispatch: event handle: ff84 Sep 26 17:42:10.861494 imma [11005:eda_saf_api.c:0363] << saEvtDispatch Sep 26 17:42:11.340907 imma [11005:eda_mds.c:0943] T1 Event Server is DOWN on node_id: 0 Sep 26 17:42:11.345418 imma [11005:eda_saf_api.c:2097] >> saEvtEventPublish: Allocated event handle: ffc00029 Sep 26 17:42:11.345457 imma [11005:eda_saf_api.c:2127] T2 Unable to retrieve allocated event handle: ffc00029 Sep 26 17:42:11.345471 imma [11005:eda_saf_api.c:2128] << saEvtEventPublish Sep 26 17:42:11.347997 imma [11005:eda_saf_api.c:2364] >> saEvtEventSubscribe: channel handle: ffd00021 Sep 26 17:42:11.348081 imma [11005:eda_saf_api.c:2447] T2 event server is not yet up Sep 26 17:42:11.348121 imma [11005:eda_saf_api.c:2448] << saEvtEventSubscribe Sep 26 17:42:11.861956 imma [11005:eda_saf_api.c:0320] >> saEvtDispatch: event handle: ff84 Sep 26 17:42:11.862080 imma [11005:eda_saf_api.c:0363] << saEvtDispatch Sep 26 17:42:12.299715 imma [11005:ntfa_mds.c:0388] T2 NTFA Rcvd MDS subscribe evt from svc 28 Sep 26 17:42:12.299736 imma [11005:ntfa_mds.c:0398] TR NTFS down Sep 26 17:42:12.299773 imma [11005:ntfa_util.c:1499] >> ntfa_update_ntfsv_state Sep 26 17:42:12.299782 imma [11005:ntfa_util.c:1501] T1 Current state: 4, Changed state: 2 Sep 26 17:42:12.299790 imma [11005:ntfa_util.c:1542] TR Active NTF server temporarily unavailable Sep 26 17:42:12.299796 imma [11005:ntfa_util.c:1554] << ntfa_update_ntfsv_state This issue is is randomly observed and not observed in the earlier release. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2067 EVT : Api returns BAD_HANDLE, during middleware si-swap operation
--- ** [tickets:#2067] EVT : Api returns BAD_HANDLE, during middleware si-swap operation** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Mon Sep 26, 2016 12:44 PM UTC by Srikanth R **Last Updated:** Mon Sep 26, 2016 12:44 PM UTC **Owner:** nobody Changeset : 7997 5.1.FC Setup : 2 controllers and 2 payloads with headless feature disabled. Evt api returns SA_AIS_ERR_BAD_HANDLE during middleware si-swap operation. In the below scenario, saEvtEventPublish returns BAD_HANDLE. The api is called, after invoking middleware switchover. Sep 26 17:42:10.861357 imma [11005:eda_saf_api.c:0320] >> saEvtDispatch: event handle: ff84 Sep 26 17:42:10.861494 imma [11005:eda_saf_api.c:0363] << saEvtDispatch Sep 26 17:42:11.340907 imma [11005:eda_mds.c:0943] T1 Event Server is DOWN on node_id: 0 Sep 26 17:42:11.345418 imma [11005:eda_saf_api.c:2097] >> saEvtEventPublish: Allocated event handle: ffc00029 Sep 26 17:42:11.345457 imma [11005:eda_saf_api.c:2127] T2 Unable to retrieve allocated event handle: ffc00029 Sep 26 17:42:11.345471 imma [11005:eda_saf_api.c:2128] << saEvtEventPublish Sep 26 17:42:11.347997 imma [11005:eda_saf_api.c:2364] >> saEvtEventSubscribe: channel handle: ffd00021 Sep 26 17:42:11.348081 imma [11005:eda_saf_api.c:2447] T2 event server is not yet up Sep 26 17:42:11.348121 imma [11005:eda_saf_api.c:2448] << saEvtEventSubscribe Sep 26 17:42:11.861956 imma [11005:eda_saf_api.c:0320] >> saEvtDispatch: event handle: ff84 Sep 26 17:42:11.862080 imma [11005:eda_saf_api.c:0363] << saEvtDispatch Sep 26 17:42:12.299715 imma [11005:ntfa_mds.c:0388] T2 NTFA Rcvd MDS subscribe evt from svc 28 Sep 26 17:42:12.299736 imma [11005:ntfa_mds.c:0398] TR NTFS down Sep 26 17:42:12.299773 imma [11005:ntfa_util.c:1499] >> ntfa_update_ntfsv_state Sep 26 17:42:12.299782 imma [11005:ntfa_util.c:1501] T1 Current state: 4, Changed state: 2 Sep 26 17:42:12.299790 imma [11005:ntfa_util.c:1542] TR Active NTF server temporarily unavailable Sep 26 17:42:12.299796 imma [11005:ntfa_util.c:1554] << ntfa_update_ntfsv_state This issue is is randomly observed and not observed in the earlier release. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2059 AMF: SU struck in terminating state, for failure in component restart
--- ** [tickets:#2059] AMF: SU struck in terminating state, for failure in component restart** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Thu Sep 22, 2016 07:05 AM UTC by Srikanth R **Last Updated:** Thu Sep 22, 2016 07:05 AM UTC **Owner:** nobody **Attachments:** - [2059.tgz](https://sourceforge.net/p/opensaf/tickets/2059/attachment/2059.tgz) (821.7 kB; application/x-compressed-tar) Setup : Changeset : 7997 5.1.FC Setup : 5 node setup with 3 payloads. App : 2N PI Application with SUs hosted on PL-3,PL-4,SC-2. Issue : SU struck in terminating state, for failure in component restart Steps performed : -> Initially brought up the attached AMF configuration and 2 SUs got assigned successfully. -> Now moved the instantiation script on SU1. Note that termination script and instantiation script are different. -> Killed component of SU1. The component went for restart and assignments got removed from SU1 and SU2 & SU3 got active & standby. -> As the instantiation script is not available, SU should be moved to instantiation failed state. But the SU is struck in terminating state. -> Lock operation on SU succeeded, but lock-in operation on SU resulted in SG unstable. 175 12:28:31 09/22/2016 NO safApp=safAmfService "Admin op "LOCK" initiated for 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN', invocation: 408021893121" 176 12:28:31 09/22/2016 NO safApp=safAmfService "safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN AdmState UNLOCKED => LOCKED" 177 12:28:31 09/22/2016 NO safApp=safAmfService "Admin op done for invocation: 408021893121, result 1" 178 12:28:39 09/22/2016 NO safApp=safAmfService "Admin op "LOCK_INSTANTIATION" initiated for 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN', invocation: 412316860417" 179 12:28:39 09/22/2016 NO safApp=safAmfService "Admin op invocation: 412316860417, err: ''safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' presence state is '4''" 180 12:28:39 09/22/2016 NO safApp=safAmfService "Admin op done for invocation: 412316860417, result 6" --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2042 EVT : Application segfaulted in MDS callback processing
- **summary**: EVT : Application segfaulted during --> EVT : Application segfaulted in MDS callback processing --- ** [tickets:#2042] EVT : Application segfaulted in MDS callback processing** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Fri Sep 16, 2016 12:22 PM UTC by Srikanth R **Last Updated:** Fri Sep 16, 2016 12:22 PM UTC **Owner:** nobody **Attachments:** - [eda_bt](https://sourceforge.net/p/opensaf/tickets/2042/attachment/eda_bt) (59.5 kB; application/octet-stream) Setup : 7997 5.1.FC Issue : Application segfaulted on payload in MDS callback processing by EVT thread. Below is the backtrace. 0 0x7ff2f5282d64 in ncs_decode_32bit (stream=0x7ff2f6b95c98) at hj_dec.c:197 1 0x7ff2f5f181e4 in eda_mds_dec (info=0x7ff2f6b95dd0) at eda_mds.c:1285 2 0x7ff2f5f185fa in eda_mds_callback (info=0x7ff2f6b95dd0) at eda_mds.c:1440 3 0x7ff2f52b887b in mds_mcm_do_decode_full_or_flat (svccb=0x639c40, cbinfo=0x7ff2f6b95dd0, recv_msg=0x7aace8, orig_msg=0x0) at mds_c_sndrcv.c:4915 4 0x7ff2f52b7841 in mds_mcm_process_recv_snd_msg_common (svccb=0x639c40, recv=0x7aace8) at mds_c_sndrcv.c:4255 5 0x7ff2f52b7f24 in mcm_recv_normal_snd (svccb=0x639c40, recv=0x7aace8) at mds_c_sndrcv.c:4389 6 0x7ff2f52b7305 in mds_mcm_ll_data_rcv (recv=0x7aace8) at mds_c_sndrcv.c:4067 7 0x7ff2f52a54ac in mdtm_process_recv_message_common (flag=0 '\000', buffer=0x61424a "\252", len=167, transport_adest=72075191086465088, seq_num_check=30108, buff_dump=0x7ff2f6b961bc) at mds_dt_common.c:505 8 0x7ff2f52a626f in mdtm_process_recv_data (buffer=0x614242 "", len=175, transport_adest=72075191086465088, buff_dump=0x7ff2f6b961bc) at mds_dt_common.c:949 9 0x7ff2f52c952f in mdtm_process_recv_events () at mds_dt_tipc.c:793 10 0x7ff2f586c7b6 in start_thread () from /lib64/libpthread.so.0 11 0x7ff2f55c89cd in clone () from /lib64/libc.so.6 The entire backtrace is attached as an attachment. This issue is observed in earlier releases also. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2042 EVT : Application segfaulted during
--- ** [tickets:#2042] EVT : Application segfaulted during ** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Fri Sep 16, 2016 12:22 PM UTC by Srikanth R **Last Updated:** Fri Sep 16, 2016 12:22 PM UTC **Owner:** nobody **Attachments:** - [eda_bt](https://sourceforge.net/p/opensaf/tickets/2042/attachment/eda_bt) (59.5 kB; application/octet-stream) Setup : 7997 5.1.FC Issue : Application segfaulted on payload in MDS callback processing by EVT thread. Below is the backtrace. 0 0x7ff2f5282d64 in ncs_decode_32bit (stream=0x7ff2f6b95c98) at hj_dec.c:197 1 0x7ff2f5f181e4 in eda_mds_dec (info=0x7ff2f6b95dd0) at eda_mds.c:1285 2 0x7ff2f5f185fa in eda_mds_callback (info=0x7ff2f6b95dd0) at eda_mds.c:1440 3 0x7ff2f52b887b in mds_mcm_do_decode_full_or_flat (svccb=0x639c40, cbinfo=0x7ff2f6b95dd0, recv_msg=0x7aace8, orig_msg=0x0) at mds_c_sndrcv.c:4915 4 0x7ff2f52b7841 in mds_mcm_process_recv_snd_msg_common (svccb=0x639c40, recv=0x7aace8) at mds_c_sndrcv.c:4255 5 0x7ff2f52b7f24 in mcm_recv_normal_snd (svccb=0x639c40, recv=0x7aace8) at mds_c_sndrcv.c:4389 6 0x7ff2f52b7305 in mds_mcm_ll_data_rcv (recv=0x7aace8) at mds_c_sndrcv.c:4067 7 0x7ff2f52a54ac in mdtm_process_recv_message_common (flag=0 '\000', buffer=0x61424a "\252", len=167, transport_adest=72075191086465088, seq_num_check=30108, buff_dump=0x7ff2f6b961bc) at mds_dt_common.c:505 8 0x7ff2f52a626f in mdtm_process_recv_data (buffer=0x614242 "", len=175, transport_adest=72075191086465088, buff_dump=0x7ff2f6b961bc) at mds_dt_common.c:949 9 0x7ff2f52c952f in mdtm_process_recv_events () at mds_dt_tipc.c:793 10 0x7ff2f586c7b6 in start_thread () from /lib64/libpthread.so.0 11 0x7ff2f55c89cd in clone () from /lib64/libc.so.6 The entire backtrace is attached as an attachment. This issue is observed in earlier releases also. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1486 smf : SMFD asserted in csi active callback during switchovers ( ncs_sel_obj_create: socketpair failed )
- **summary**: SMFD faulted in active callback during switchovers --> smf : SMFD asserted in csi active callback during switchovers ( ncs_sel_obj_create: socketpair failed ) - **Component**: unknown --> smf --- ** [tickets:#1486] smf : SMFD asserted in csi active callback during switchovers ( ncs_sel_obj_create: socketpair failed )** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Wed Sep 16, 2015 10:04 AM UTC by Ritu Raj **Last Updated:** Wed May 04, 2016 07:27 PM UTC **Owner:** nobody Setup 4.6GA with changeset 6490 4 nodes(OEL6.4 with TIPC version 1.7.7) configured with no PBE configured Issues Observed: > Cluser went for reboot during switchover as SMFD faulted due to 'csiSetcallbackFailed' Steps Performed: * Continuous switchovers are invoked on the setup. * After a count of over 1000 switchovers, Standby Controller (SC-2) got rebooted when it is being promoted to ACTIVE state , as SMFD failed in active callback. Sep 16 06:25:00 SLOT-2 osafsmfd[1926]: ER amf_active_state_handler oi activate FAIL Sep 16 06:25:00 SLOT-2 osafamfnd[1802]: NO 'safComp=SMF,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'csiSetcallbackFailed' : Recovery is 'nodeFailfast' Sep 16 06:25:00 SLOT-2 osafamfnd[1802]: ER safComp=SMF,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:csiSetcallbackFailed Recovery is:nodeFailfast Sep 16 06:25:00 SLOT-2 osafamfnd[1802]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131599, SupervisionTime = 60 * After SC-2 went for reboot, SC-1 tried to become active, during which smfd also faulted on the new promoted back active controller. Sep 16 06:25:00 SLOT-1 root: Invoking switchover from invoke_switchover.sh Sep 16 06:25:00 SLOT-1 osafamfd[3830]: NO safSi=SC-2N,safApp=OpenSAF Swap initiated Sep 16 06:25:00 SLOT-1 osafamfnd[3845]: NO Assigning 'safSi=SC-2N,safApp=OpenSAF' QUIESCED to 'safSu=SC-1,safSg=2N,safApp=OpenSAF' Sep 16 06:25:00 SLOT-1 osafsmfd[3871]: ncs_sel_obj_create: socketpair failed - Too many open files Sep 16 06:25:05 SLOT-1 kernel: TIPC: Resetting link <1.1.1:eth0-1.1.2:eth1>, peer not responding Sep 16 06:25:05 SLOT-1 kernel: TIPC: Lost link <1.1.1:eth0-1.1.2:eth1> on network plane A Sep 16 06:25:05 SLOT-1 kernel: TIPC: Lost contact with <1.1.2> Sep 16 06:25:05 SLOT-1 osaffmd[3716]: NO Node Down event for node id 2020f: Sep 16 06:25:06 SLOT-1 osafimmnd[3746]: NO This IMMND re-elected coord redundantly, failover ? Sep 16 06:25:06 SLOT-1 osafsmfd[3871]: ncs_sel_obj_create: socketpair failed - Too many open files Sep 16 06:25:06 SLOT-1 osafsmfd[3871]: ER immutil_saImmOiInitialize_2 fail, rc = 2 ... Sep 16 06:25:06 SLOT-1 osafamfnd[3845]: ER safComp=SMF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:csiSetcallbackFailed Recovery is:nodeFailfast Sep 16 06:25:06 SLOT-1 osafamfnd[3845]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131343, SupervisionTime = 60 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1765 ckpt : saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after couple of failover
- **summary**: saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after couple of failover --> ckpt : saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after couple of failover - **Comment**: Application output with syslog running as background process. Sep 15 18:46:10 SYSTEST-PLD-1 kernel: [ 1204.300498] TIPC: Established link <1.1.3:eth3-1.1.2:eth3> on network plane A Sep 15 18:46:11 SYSTEST-PLD-1 osafimmnd[4936]: NO NODE STATE-> IMM_NODE_R_AVAILABLE Sep 15 18:46:11 SYSTEST-PLD-1 osafimmnd[4936]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 19001 Sep 15 18:46:11 SYSTEST-PLD-1 osafimmnd[4936]: NO Epoch set to 4 in ImmModel Sep 15 18:46:12 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer connected: 14 (MsgQueueService131599) <0, 2020f> Sep 15 18:46:12 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer (applier) connected: 15 (@safAmfService2020f) <0, 2020f> Sep 15 18:46:12 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer (applier) connected: 16 (@OpenSafImmReplicatorB) <0, 2020f> SYSTEST-PLD-1:/home//cpsv_fo # *** Demonstrating Checkpoint Service Usage with a collocated Checkpoint *** Initialising With Checkpoint Service Sep 15 18:46:13 SYSTEST-PLD-1 a.out: logtrace: trace enabled to file /home//cpsv_fo/ckpt.trace, mask=0x PASSED Opening Collocated Checkpoint = safCkpt=DemoCkpt,safApp=safCkptService PASSED Opening Collocated Checkpoint = safCkpt=DemoCkpt,safApp=safCkptService with create flags PASSED Press key to continue... Invoke failover Sep 15 18:46:51 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer disconnected 8 <0, 2010f> (safEvtService) Sep 15 18:46:56 SYSTEST-PLD-1 kernel: [ 1250.704238] TIPC: Resetting link <1.1.3:eth3-1.1.1:eth0>, peer not responding Sep 15 18:46:56 SYSTEST-PLD-1 kernel: [ 1250.704251] TIPC: Lost link <1.1.3:eth3-1.1.1:eth0> on network plane A Sep 15 18:46:56 SYSTEST-PLD-1 kernel: [ 1250.704259] TIPC: Lost contact with <1.1.1> ... ... Sep 15 18:46:57 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer connected: 23 (safEvtService) <0, 2020f> Sep 15 18:46:57 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer (applier) connected: 24 (@safLogService_appl) <0, 2020f> Sep 15 18:46:57 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer connected: 25 (safSmfService) <0, 2020f> Sep 15 18:46:57 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer (applier) connected: 26 (@OpenSafImmReplicatorA) <0, 2020f> **Unlink My Checkpoint Failed :5** Ckpt Finalize being called PASSED SYSTEST-PLD-1:/home//cpsv_fo # Sep 15 18:47:17 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer connected: 27 (MsgQueueService131343) <0, 2020f> Sep 15 18:47:17 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer disconnected 27 <0, 2020f> (MsgQueueService131343) Sep 15 18:47:18 SYSTEST-PLD-1 kernel: [ 1272.242604] TIPC: Established link <1.1.3:eth3-1.1.1:eth0> on network plane A Sep 15 18:47:19 SYSTEST-PLD-1 osafimmnd[4936]: NO NODE STATE-> IMM_NODE_R_AVAILABLE Sep 15 18:47:19 SYSTEST-PLD-1 osafimmnd[4936]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 19001 Sep 15 18:47:19 SYSTEST-PLD-1 osafimmnd[4936]: NO Epoch set to 5 in ImmModel Sep 15 18:47:20 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer connected: 28 (MsgQueueService131343) <0, 2010f> Sep 15 18:47:21 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer (applier) connected: 29 (@safAmfService2010f) <0, 2010f> Sep 15 18:47:21 SYSTEST-PLD-1 osafimmnd[4936]: NO Implementer (applier) connected: 30 (@OpenSafImmReplicatorB) <0, 2010f> SYSTEST-PLD-1:/home//cpsv_fo # ./a.out *** Demonstrating Checkpoint Service Usage with a collocated Checkpoint *** Initialising With Checkpoint Service Sep 15 18:48:24 SYSTEST-PLD-1 a.out: logtrace: trace enabled to file /home//cpsv_fo/ckpt.trace, mask=0x PASSED Opening Collocated Checkpoint = safCkpt=DemoCkpt,safApp=safCkptService **Ckpt open Failed (2).** Hence exiting --- ** [tickets:#1765] ckpt : saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after couple of failover** **Status:** accepted **Milestone:** 4.7.2 **Created:** Fri Apr 15, 2016 06:26 AM UTC by Ritu Raj **Last Updated:** Thu Sep 15, 2016 01:27 PM UTC **Owner:** Pham Hoang Nhat **Attachments:** - [ckpt_trace.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1765/attachment/ckpt_trace.tar.bz2) (3.2 MB; application/x-bzip) setup: Changeset- 7436 Version - opensaf 5.0 FC 4 nodes configured with single PBE and a load of 30K objects * Issue observed : saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after couple of failover * Steps to reproduce: > Ran couple of failover and observed saCkptCheckpointOpen failed. > below is the snippet of agent trace: Apr 15 8:08:50.275115 cpa [28883:cpa_mds.c:0776] << cpa_mds_msg_sync_
[tickets] [opensaf:tickets] #1765 saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after couple of failover
Hi Pham, We had applied the patch on 5.0 GA and observed that the issue is still observed. Below are the steps and the apis used in the application to reproduce the issue. Application : -> Invoke saCkptInitialize -> Invoke saCkptCheckpointOpen with create flag and SA_CKPT_WR_ACTIVE_REPLICA_WEAK. -> Invoke saCkptCheckpointOpen with WRITE flag -> Wait for user to press enter ( to invoke failover ) -> Invoke saCkptCheckpointUnlink -> Invoke saCkptFinalize Steps to reproduce the issue : -> Initially start a single controller and payload. -> Start the other controller, which shall join as standby. -> Once the standby controller is joining, invoke the application on the payload. This is such a way that the CKPT apis shall be invoked when CKPT cold sync is in progress. -> After a sleep of 20 seconds, induce middle failover and later unblock the application after which unlink and finalize apis shall be invoked. The unlink api returns TIME_OUT and the IMM objects are not deleted from the DB immfind | grep -i Demo safCkpt=DemoCkpt,safApp=safCkptService safReplica=safNode=PL-3\,safCluster=myClmCluster,safCkpt=DemoCkpt,safApp=safCkptService safReplica=safNode=SC-1\,safCluster=myClmCluster,safCkpt=DemoCkpt,safApp=safCkptService safReplica=safNode=SC-2\,safCluster=myClmCluster,safCkpt=DemoCkpt,safApp=safCkptService -> If this application is invoked next time, checkpoint open shall return SA_AIS_ERR_LIBRARY. -> At this stage, if the application is invoked twice, ckptd segfaults and the ticket #2011 is raised regarding that. This issue (#1765) seems to be similar as #247, which has been closed as non-reproducible. Some times, checkpoint open also gets SA_AIS_ERR_RESOURCES as mentioned in #247. -- Srikanth Attachments: - [1765.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/8ea9d424/d730/attachment/1765.tgz) (111.5 kB; application/x-compressed-tar) --- ** [tickets:#1765] saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after couple of failover** **Status:** accepted **Milestone:** 4.7.2 **Created:** Fri Apr 15, 2016 06:26 AM UTC by Ritu Raj **Last Updated:** Wed May 04, 2016 06:56 PM UTC **Owner:** Pham Hoang Nhat **Attachments:** - [ckpt_trace.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1765/attachment/ckpt_trace.tar.bz2) (3.2 MB; application/x-bzip) setup: Changeset- 7436 Version - opensaf 5.0 FC 4 nodes configured with single PBE and a load of 30K objects * Issue observed : saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after couple of failover * Steps to reproduce: > Ran couple of failover and observed saCkptCheckpointOpen failed. > below is the snippet of agent trace: Apr 15 8:08:50.275115 cpa [28883:cpa_mds.c:0776] << cpa_mds_msg_sync_send: retval = 1 Apr 15 8:08:50.275128 cpa [28883:cpa_api.c:1043] T4 Cpa CkptOpen failed with return value:2,ckptHandle:63 Apr 15 8:08:50.275141 cpa [28883:cpa_api.c:1146] << **saCkptCheckpointOpen: API return code = 2** > Traces of both controllers and agent trace of payload is attached. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2036 build : make rpm fails, if installation directories are specified
--- ** [tickets:#2036] build : make rpm fails, if installation directories are specified** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Thu Sep 15, 2016 06:03 AM UTC by Srikanth R **Last Updated:** Thu Sep 15, 2016 06:03 AM UTC **Owner:** nobody Environment : Setup : SLES 64bit gcc 6.1 Steps performed : Ran the following commands after downloading the opensaf from hg. -> ./bootstrap.sh -> ./configure CFLAGS="-g " CXXFLAGS="-g " --enable-tipc --enable-imm-pbe --enable-ntf-imcn --sysconfdir=/opt/etc --localstatedir=/opt/var --libdir=/opt/usr/lib -> make rpm The last step fails with the following error. Checking for unpackaged file(s): /usr/lib/rpm/check-files /home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root error: Installed (but unpackaged) file(s) found: /opt/etc/opensaf/amfd.conf /opt/etc/opensaf/amfnd.conf /opt/etc/opensaf/amfwdog.conf /opt/etc/opensaf/chassis_id /opt/etc/opensaf/ckptd.conf /opt/etc/opensaf/ckptnd.conf /opt/etc/opensaf/clmd.conf /opt/etc/opensaf/clmna.conf /opt/etc/opensaf/dtmd.conf RPM build errors: File not found: /home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root/usr/lib64/opensaf File not found: /home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root/etc/opensaf File not found: /home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root/var/lib/opensaf File not found: /home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root/var/log/opensaf File not found: /home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root/var/run/opensaf File not found: /home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root/etc/opensaf/chassis_id File not found: /home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root/etc/opensaf/slot_id . File not found by glob: /home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root/usr/lib64/libSa*.a Installed (but unpackaged) file(s) found: /opt/etc/opensaf/amfd.conf /opt/etc/opensaf/amfnd.conf /opt/etc/opensaf/amfwdog.conf /opt/etc/opensaf/chassis_id /opt/etc/opensaf/ckptd.conf /opt/etc/opensaf/ckptnd.conf /opt/etc/opensaf/clmd.conf /opt/etc/opensaf/clmna.conf /opt/etc/opensaf/dtmd.conf ... /opt/usr/lib/pkgconfig/opensaf-smf.pc /opt/usr/lib/pkgconfig/opensaf.pc make: *** [rpm] Error 1 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2022 AMF : amfd asserted for NG lock operation ( quiesced timeout - Nway model))
Attaching the logs. Attachments: - [2022.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/2d0d5691/727c/attachment/2022.tgz) (1.1 MB; application/x-compressed-tar) --- ** [tickets:#2022] AMF : amfd asserted for NG lock operation ( quiesced timeout - Nway model))** **Status:** assigned **Milestone:** 4.7.2 **Created:** Sat Sep 10, 2016 09:58 AM UTC by Srikanth R **Last Updated:** Mon Sep 12, 2016 07:21 AM UTC **Owner:** Praveen **Attachments:** - [createAppTestApp.sh](https://sourceforge.net/p/opensaf/tickets/2022/attachment/createAppTestApp.sh) (15.8 kB; text/x-shellscript) Environment details -- OS : Suse 64bit Changeset : 7997 ( 5.1.FC) Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature enabled & no PBE ) AMF Application : NPM model with SUs mapped on SC-2,PL-3,PL-4 Summary : -- AMFD on both controllers asserted, if Nway application failed in CSI SET QUIESCED callback in lock operation of node group Steps followed & Observed behaviour -- -> Hosted nway application on PL-3,PL-4 and SC-2 and brought up the application. Configuration is attached to the ticket. -> Created a node group with all the three nodes. -> Ensured that one of component will not respond to quiesced callback -> Now performed the lock operation on the node group -> amfd on both controllers asserted with the following back trace. 0 0x7f66fbc6fb55 in raise () from /lib64/libc.so.6 1 0x7f66fbc71131 in abort () from /lib64/libc.so.6 2 0x7f66fda6816a in __osafassert_fail (__file=0x51214d "su.cc", __line=2022, __func=0x513aa0 "dec_curr_stdby_si", __assertion=0x51355f "saAmfSUNumCurrStandbySIs > 0") at sysf_def.c:281 3 0x004d68cd in AVD_SU::dec_curr_stdby_si (this=0x7ccf40) at su.cc:2022 4 0x004be804 in avd_susi_update_assignment_counters (susi=0x78c670, action=AVSV_SUSI_ACT_DEL, current_ha_state=0, new_ha_state=0) at siass.cc:783 5 0x004be59b in avd_susi_del_send (susi=0x78c670) at siass.cc:714 6 0x004af12e in avd_sg_nway_node_fail_stable (cb=0x751b80, su=0x800470, susi=0x0) at sg_nway_fsm.cc:3022 7 0x004b025d in avd_sg_nway_node_fail_sg_realign (cb=0x751b80, su=0x800470) at sg_nway_fsm.cc:3493 8 0x004a8042 in SG_NWAY::node_fail (this=0x797c50, cb=0x751b80, su=0x800470) at sg_nway_fsm.cc:497 9 0x004b209e in sg_su_failover_func (su=0x800470) at sgproc.cc:525 10 0x004b2d16 in avd_su_oper_state_evh (cb=0x751b80, evt=0x7f66f4002940) at sgproc.cc:838 11 0x00450ba9 in process_event (cb_now=0x751b80, evt=0x7f66f4002940) at main.cc:768 12 0x004508cd in main_loop () at main.cc:689 13 0x00450e43 in main (argc=2, argv=0x7fff0f81ab18) at main.cc:841 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2023 AMF : Long DN RT objects creation failed with ERR_TOO_LONG (13)
If IMM has maximum limit of 2048 for the longDN object, then AMF should reject the creation of application objects by calculating the size of the rt objects. --- ** [tickets:#2023] AMF : Long DN RT objects creation failed with ERR_TOO_LONG (13)** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Sat Sep 10, 2016 10:57 AM UTC by Srikanth R **Last Updated:** Tue Sep 13, 2016 01:01 AM UTC **Owner:** nobody **Attachments:** - [2023.tgz](https://sourceforge.net/p/opensaf/tickets/2023/attachment/2023.tgz) (159.7 kB; application/x-compressed-tar) Environment details -- OS : Suse 64bit Changeset : 7997 ( 5.1.FC) Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature disabled & no PBE & longDn feature enabled ) AMF Application : 2N model with SUs mapped on PL-3,PL-4 Summary : -- Long DN RT objects creation failed with ERR_TOO_LONG during unlock operation of SU. Steps followed & Observed behaviour -- -> Initially enabled the longDn feature. -> Later imported the attached AMF configuration successfully. -> Now performed unlock-in and unlock operation of SU, for which following error is observed in syslog. Sep 10 16:11:43 CONTROLLER-2 osafamfnd[4279]: NO Assigned 'safSi=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz,safApp=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz' ACTIVE to 'safSu=SU1,safSg=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopq rstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz,safApp=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz' Sep 10 16:11:43 CONTROLLER-2 osafamfd[4265]: ER exec: create FAILED 13 Sep 10 16:11:46 CONTROLLER-2 osafamfd[4265]:** ER exec: create FAILED 13** Below is the corresponding trace in osafamfd : Sep 10 16:11:46.647681 osafamfd [4265:imm.cc:0396] >> execute Sep 10 16:11:46.647730 osafamfd [4265:imm.cc:0142] >> exec: Create safCsi=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz_CSIA,safSi=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz,safApp=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxy zabcdefghijklmnopqrstuvT Sep 10 16:11:46.647783 osafamfd [4265:imma_oi_api.c:2786] >> rt_object_create_common Sep 10 16:11:46.647879 osafamfd [4265:imma_oi_api.c:2892] TR attr:safCSIComp Sep 10 16:11:46.647908 osafamfd [4265:imma_oi_api.c:2892] TR attr:saAmfCSICompHAState Sep 10 16:11:46.647927 osafamfd [4265:imma_oi_api.c:2892] TR attr:saAmfCSICompHAReadinessState Sep 10 16:11:46.649108 osafamfd [4265:imma_oi_api.c:3063] << rt_object_create_common Sep 10 16:11:46.649157 osafamfd [4265:imm.cc:0163] ER exec: create FAILED 13 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforg
[tickets] [opensaf:tickets] #2023 AMF : Long DN RT objects creation failed with ERR_TOO_LONG (13)
Attaching the configuration and IMMD &IMMND traces also. Attachments: - [2023_longDn.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/b0d730dd/21a6/attachment/2023_longDn.tgz) (821.2 kB; application/x-compressed) --- ** [tickets:#2023] AMF : Long DN RT objects creation failed with ERR_TOO_LONG (13)** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Sat Sep 10, 2016 10:57 AM UTC by Srikanth R **Last Updated:** Mon Sep 12, 2016 01:59 AM UTC **Owner:** nobody **Attachments:** - [2023.tgz](https://sourceforge.net/p/opensaf/tickets/2023/attachment/2023.tgz) (159.7 kB; application/x-compressed-tar) Environment details -- OS : Suse 64bit Changeset : 7997 ( 5.1.FC) Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature disabled & no PBE & longDn feature enabled ) AMF Application : 2N model with SUs mapped on PL-3,PL-4 Summary : -- Long DN RT objects creation failed with ERR_TOO_LONG during unlock operation of SU. Steps followed & Observed behaviour -- -> Initially enabled the longDn feature. -> Later imported the attached AMF configuration successfully. -> Now performed unlock-in and unlock operation of SU, for which following error is observed in syslog. Sep 10 16:11:43 CONTROLLER-2 osafamfnd[4279]: NO Assigned 'safSi=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz,safApp=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz' ACTIVE to 'safSu=SU1,safSg=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopq rstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz,safApp=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz' Sep 10 16:11:43 CONTROLLER-2 osafamfd[4265]: ER exec: create FAILED 13 Sep 10 16:11:46 CONTROLLER-2 osafamfd[4265]:** ER exec: create FAILED 13** Below is the corresponding trace in osafamfd : Sep 10 16:11:46.647681 osafamfd [4265:imm.cc:0396] >> execute Sep 10 16:11:46.647730 osafamfd [4265:imm.cc:0142] >> exec: Create safCsi=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz_CSIA,safSi=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz,safApp=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxy zabcdefghijklmnopqrstuvT Sep 10 16:11:46.647783 osafamfd [4265:imma_oi_api.c:2786] >> rt_object_create_common Sep 10 16:11:46.647879 osafamfd [4265:imma_oi_api.c:2892] TR attr:safCSIComp Sep 10 16:11:46.647908 osafamfd [4265:imma_oi_api.c:2892] TR attr:saAmfCSICompHAState Sep 10 16:11:46.647927 osafamfd [4265:imma_oi_api.c:2892] TR attr:saAmfCSICompHAReadinessState Sep 10 16:11:46.649108 osafamfd [4265:imma_oi_api.c:3063] << rt_object_create_common Sep 10 16:11:46.649157 osafamfd [4265:imm.cc:0163] ER exec: create FAILED 13 --- Sent from sourceforge.net
[tickets] [opensaf:tickets] #2023 AMF : Long DN RT objects creation failed with ERR_TOO_LONG (13)
--- ** [tickets:#2023] AMF : Long DN RT objects creation failed with ERR_TOO_LONG (13)** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Sat Sep 10, 2016 10:57 AM UTC by Srikanth R **Last Updated:** Sat Sep 10, 2016 10:57 AM UTC **Owner:** nobody **Attachments:** - [2023.tgz](https://sourceforge.net/p/opensaf/tickets/2023/attachment/2023.tgz) (159.7 kB; application/x-compressed-tar) Environment details -- OS : Suse 64bit Changeset : 7997 ( 5.1.FC) Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature disabled & no PBE & longDn feature enabled ) AMF Application : 2N model with SUs mapped on PL-3,PL-4 Summary : -- Long DN RT objects creation failed with ERR_TOO_LONG during unlock operation of SU. Steps followed & Observed behaviour -- -> Initially enabled the longDn feature. -> Later imported the attached AMF configuration successfully. -> Now performed unlock-in and unlock operation of SU, for which following error is observed in syslog. Sep 10 16:11:43 CONTROLLER-2 osafamfnd[4279]: NO Assigned 'safSi=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz,safApp=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz' ACTIVE to 'safSu=SU1,safSg=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopq rstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz,safApp=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz' Sep 10 16:11:43 CONTROLLER-2 osafamfd[4265]: ER exec: create FAILED 13 Sep 10 16:11:46 CONTROLLER-2 osafamfd[4265]:** ER exec: create FAILED 13** Below is the corresponding trace in osafamfd : Sep 10 16:11:46.647681 osafamfd [4265:imm.cc:0396] >> execute Sep 10 16:11:46.647730 osafamfd [4265:imm.cc:0142] >> exec: Create safCsi=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz_CSIA,safSi=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz,safApp=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxy zabcdefghijklmnopqrstuvT Sep 10 16:11:46.647783 osafamfd [4265:imma_oi_api.c:2786] >> rt_object_create_common Sep 10 16:11:46.647879 osafamfd [4265:imma_oi_api.c:2892] TR attr:safCSIComp Sep 10 16:11:46.647908 osafamfd [4265:imma_oi_api.c:2892] TR attr:saAmfCSICompHAState Sep 10 16:11:46.647927 osafamfd [4265:imma_oi_api.c:2892] TR attr:saAmfCSICompHAReadinessState Sep 10 16:11:46.649108 osafamfd [4265:imma_oi_api.c:3063] << rt_object_create_common Sep 10 16:11:46.649157 osafamfd [4265:imm.cc:0163] ER exec: create FAILED 13 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/optio
[tickets] [opensaf:tickets] #316 SI Assignments are not removed for a SU in Nway redundancy model
Issue of SU struck in quiesced is observed during lock operation of node group. -> Brought up Nway application with 3 SUs hosted on SC-2,PL-3 and PL-4. -> Locked a node group with only PL-3 as the member -> SU hosted on PL-3 assignments are not removed and is stuck in quiesced state. Configuration is attached. Attachments: - [createAppTestApp.sh](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/0143c687/6d14/attachment/createAppTestApp.sh) (15.8 kB; text/x-shellscript) --- ** [tickets:#316] SI Assignments are not removed for a SU in Nway redundancy model** **Status:** accepted **Milestone:** 4.7.2 **Created:** Fri May 24, 2013 08:39 AM UTC by Nagendra Kumar **Last Updated:** Tue Aug 09, 2016 09:48 AM UTC **Owner:** Praveen **Attachments:** - [logs.tar](https://sourceforge.net/p/opensaf/tickets/316/attachment/logs.tar) (2.5 MB; application/x-gzip-compressed) - [osafamfd](https://sourceforge.net/p/opensaf/tickets/316/attachment/osafamfd) (228.2 kB; application/octet-stream) - [osafamfnd](https://sourceforge.net/p/opensaf/tickets/316/attachment/osafamfnd) (122.8 kB; application/octet-stream) - [pl_logs.tar](https://sourceforge.net/p/opensaf/tickets/316/attachment/pl_logs.tar) (1.3 MB; application/x-gzip-compressed) Migrated from http://devel.opensaf.org/ticket/2987 changeset : 3855 Model : NWay configuration : 1App,1SG,5SU with 3comps each, 5SIs with 3csi each. si-si deps configured as SI1<-SI2<-SI3<-SI4 SIrankedSus not configured. Node mapping : SU1 on SC-1, SU2 on SC-2, SU3 on PL-3, SU4,SU5 on PL-4. While running the campaign, smf performs lock,lock-in of the activation units i.e SUs. The SIs for SU3 are not removed though SU3 is in locked-state. Subsequent unlock-in,unlock of SU3 fails. /var/log/messages of active ctrl- SC-1 shows Feb 3 22:45:14 linux-xc76 osafamfd[20055]: WA SIs still assigned to this SU Feb 3 22:45:16 linux-xc76 osafamfd[20055]: WA SIs still assigned to this SU Feb 3 22:45:18 linux-xc76 osafamfd[20055]: WA SIs still assigned to this SU Feb 3 22:45:20 linux-xc76 osafamfd[20055]: WA SIs still assigned to this SU Feb 3 22:45:23 linux-xc76 osafamfd[20055]: WA SIs still assigned to this SU Feb 3 22:45:23 linux-xc76 osafsmfd[20081]: ER Fail to invoke admin operation, too many SA_AIS_ERR_TRY_AGAIN, giving up. dn=[safSu=SU3,safSg=SGONE,safApp=NWAYAPP], opId=[3] Feb 3 22:45:23 linux-xc76 osafsmfd[20081]: ER Failed to call admin operation 3 on safSu=SU3,safSg=SGONE,safApp=NWAYAPP Feb 3 22:45:23 linux-xc76 osafsmfd[20081]: ER Failed to Terminate activation units in step=safSmfStep=0003 Feb 3 22:45:23 linux-xc76 osafsmfd[20081]: ER Step undoing failed Feb 3 22:45:23 linux-xc76 osafsmfd[20081]: ER Step safSmfStep=0003 in procedure safSmfProc=amfClusterProc-1 failed, step result 5 Feb 3 22:45:23 linux-xc76 osafsmfd[20081]: NO CAMP: Procedure safSmfProc=amfClusterProc-1 returned FAILED SU Assignments brief: === safSISU=safSu=SU1\,safSg=SGONE\,safApp=NWAYAPP,safSi=NWAYSI3,safApp=NWAYAPP saAmfSISUHAState=ACTIVE(1) safSISU=safSu=SU1\,safSg=SGONE\,safApp=NWAYAPP,safSi=NWAYSI2,safApp=NWAYAPP saAmfSISUHAState=STANDBY(2) safSISU=safSu=SU3\,safSg=SGONE\,safApp=NWAYAPP,safSi=NWAYSI5,safApp=NWAYAPP saAmfSISUHAState=QUIESCED(3) safSISU=safSu=SU4\,safSg=SGONE\,safApp=NWAYAPP,safSi=NWAYSI5,safApp=NWAYAPP saAmfSISUHAState=ACTIVE(1) safSISU=safSu=SU2\,safSg=SGONE\,safApp=NWAYAPP,safSi=NWAYSI1,safApp=NWAYAPP saAmfSISUHAState=ACTIVE(1) SU States: == safSu=SU3,safSg=SGONE,safApp=NWAYAPP saAmfSUAdminState=LOCKED(2) saAmfSUOperState=ENABLED(1) saAmfSUPresenceState=INSTANTIATED(3) saAmfSUReadinessState=OUT-OF-SERVICE(1) changed 4 months ago by bertil ¶ ■owner changed from ingber to ravisekhar ■component changed from saf/smfsv to saf/avsv I beleave this is an AMF problem. SMF only uses the AMF admin ops (lock, unlock etc). --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2022 AMF : amfd asserted for NG lock operation ( quiesced timeout - Nway model))
--- ** [tickets:#2022] AMF : amfd asserted for NG lock operation ( quiesced timeout - Nway model))** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Sat Sep 10, 2016 09:58 AM UTC by Srikanth R **Last Updated:** Sat Sep 10, 2016 09:58 AM UTC **Owner:** nobody **Attachments:** - [createAppTestApp.sh](https://sourceforge.net/p/opensaf/tickets/2022/attachment/createAppTestApp.sh) (15.8 kB; text/x-shellscript) Environment details -- OS : Suse 64bit Changeset : 7997 ( 5.1.FC) Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature enabled & no PBE ) AMF Application : NPM model with SUs mapped on SC-2,PL-3,PL-4 Summary : -- AMFD on both controllers asserted, if Nway application failed in CSI SET QUIESCED callback in lock operation of node group Steps followed & Observed behaviour -- -> Hosted nway application on PL-3,PL-4 and SC-2 and brought up the application. Configuration is attached to the ticket. -> Created a node group with all the three nodes. -> Ensured that one of component will not respond to quiesced callback -> Now performed the lock operation on the node group -> amfd on both controllers asserted with the following back trace. 0 0x7f66fbc6fb55 in raise () from /lib64/libc.so.6 1 0x7f66fbc71131 in abort () from /lib64/libc.so.6 2 0x7f66fda6816a in __osafassert_fail (__file=0x51214d "su.cc", __line=2022, __func=0x513aa0 "dec_curr_stdby_si", __assertion=0x51355f "saAmfSUNumCurrStandbySIs > 0") at sysf_def.c:281 3 0x004d68cd in AVD_SU::dec_curr_stdby_si (this=0x7ccf40) at su.cc:2022 4 0x004be804 in avd_susi_update_assignment_counters (susi=0x78c670, action=AVSV_SUSI_ACT_DEL, current_ha_state=0, new_ha_state=0) at siass.cc:783 5 0x004be59b in avd_susi_del_send (susi=0x78c670) at siass.cc:714 6 0x004af12e in avd_sg_nway_node_fail_stable (cb=0x751b80, su=0x800470, susi=0x0) at sg_nway_fsm.cc:3022 7 0x004b025d in avd_sg_nway_node_fail_sg_realign (cb=0x751b80, su=0x800470) at sg_nway_fsm.cc:3493 8 0x004a8042 in SG_NWAY::node_fail (this=0x797c50, cb=0x751b80, su=0x800470) at sg_nway_fsm.cc:497 9 0x004b209e in sg_su_failover_func (su=0x800470) at sgproc.cc:525 10 0x004b2d16 in avd_su_oper_state_evh (cb=0x751b80, evt=0x7f66f4002940) at sgproc.cc:838 11 0x00450ba9 in process_event (cb_now=0x751b80, evt=0x7f66f4002940) at main.cc:768 12 0x004508cd in main_loop () at main.cc:689 13 0x00450e43 in main (argc=2, argv=0x7fff0f81ab18) at main.cc:841 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2021 AMF : active compname is improperly populated in Standby callback (NPM)
--- ** [tickets:#2021] AMF : active compname is improperly populated in Standby callback (NPM)** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Sat Sep 10, 2016 06:52 AM UTC by Srikanth R **Last Updated:** Sat Sep 10, 2016 06:52 AM UTC **Owner:** nobody For an application with NPM model, active compName in the standby descriptor is having corrupted value in the standby callback. Breakpoint 1, pycbk_SaAmfCSISetCallbackT (invocation=4287627278, compName=0x941a28, haState=SA_AMF_HA_STANDBY, csiDescriptor=...) at saAmf_wrap.c:2914 2914saAmf_wrap.c: No such file or directory. (gdb) p csiDescriptor $1 = {csiFlags = 1, csiName = {length = 48, value = "safCsi=CSI1,safSi=TestApp_SI4,safApp=TestApp_Npm", '\000' }, csiStateDescriptor = {activeDescriptor = {transitionDescriptor = 1634926660, activeCompName = {length = 0, value = "\000mp=CO\000\000\000\000\000\000\000\000u=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_Npm", '\000' }}, standbyDescriptor = {activeCompName = { length = 68, value = "**sa\000\000\000mp=CO\000\000\000\000\000\000\000\000u=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_Npm**", '\000' }, standbyRank = 0}}, csiAttr = {attr = 0x7642a0, number = 1}} In the above callback ( in gdb ), the active component name in standby descriptor in standby callback should be safComp=COMP1,safSu=TestApp_SU3,safSg=TestApp_SG1,safApp=TestApp_Npm, but it is populated with improper value : sa\000\000\000mp=CO\000\000\000\000\000\000\000\000u=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_Npmapo --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2020 AMF : Additional features for csiAttributeChangeCallback
--- ** [tickets:#2020] AMF : Additional features for csiAttributeChangeCallback ** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Sat Sep 10, 2016 05:53 AM UTC by Srikanth R **Last Updated:** Sat Sep 10, 2016 05:53 AM UTC **Owner:** nobody The following features can be considered additionally for csiAttributeChangeCallback implementation. -> Currently both active and standby receives csiAttributeChangeCallback simultaneously. But csiAttributeChangeCallback should be handled in a way like csiSet callback. Initially Component with active assignment should receive the callback and later the standby should receive. There might be scenario in user application that standby shall try to access an object, which is associated with a CSI and should be created by active. If both the components simultaneously gets callback, then standby may behave erroneoulsy if it processes the callback before a busy active component processes the callback. -> Currnelty, the csiAttributeChangeCallback is invoked only when values are added to existing csi attrib class. But if a new csi attribute class is created, callback is not invoked. Callback should be invoked for every modification of csi attrib objects. All the operations create, modify and delete should be supported. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #326 amf: proxied SU's presence state hangs at INSTANTIATING state.
Even for failure in during csi attribute change callback timeout, proxied SU got struck in INSTANTIATING state. Sep 9 15:15:22 SLES-SLOT4 osafamfnd[25941]: NO 'safComp=proxied,safSu=PROXIED_SU1_2N,safSg=PROXIED_SG_2N,safApp=PROXIED_2N' recovery action escalated from 'componentRestart' to 'suFailover' Sep 9 15:15:22 SLES-SLOT4 osafamfnd[25941]: NO 'safComp=proxied,safSu=PROXIED_SU1_2N,safSg=PROXIED_SG_2N,safApp=PROXIED_2N' faulted due to 'csiAttributeChangeCallbackTimeout' : Recovery is 'suFailover' Sep 9 15:15:22 SLES-SLOT4 osafamfnd[25941]: NO Terminating components of 'safSu=PROXIED_SU1_2N,safSg=PROXIED_SG_2N,safApp=PROXIED_2N'(abruptly & unordered) Sep 9 15:15:22 SLES-SLOT4 osafamfnd[25941]: NO 'safSu=PROXIED_SU1_2N,safSg=PROXIED_SG_2N,safApp=PROXIED_2N' Presence State INSTANTIATED => TERMINATING Sep 9 15:15:22 SLES-SLOT4 osafamfnd[25941]: NO 'safSu=PROXIED_SU1_2N,safSg=PROXIED_SG_2N,safApp=PROXIED_2N' Presence State TERMINATING => TERMINATING Sep 9 15:15:27 SLES-SLOT4 osafamfnd[25941]: NO 'safSu=PROXIED_SU1_2N,safSg=PROXIED_SG_2N,safApp=PROXIED_2N' Presence State TERMINATING => UNINSTANTIATED Sep 9 15:15:27 SLES-SLOT4 osafamfnd[25941]: NO Terminated all components in 'safSu=PROXIED_SU1_2N,safSg=PROXIED_SG_2N,safApp=PROXIED_2N' Sep 9 15:15:27 SLES-SLOT4 osafamfnd[25941]: NO Informing director of sufailover Sep 9 15:15:27 SLES-SLOT4 osafamfnd[25941]: NO Repair request for 'safSu=PROXIED_SU1_2N,safSg=PROXIED_SG_2N,safApp=PROXIED_2N' Sep 9 15:15:27 SLES-SLOT4 osafamfnd[25941]: NO 'safSu=PROXIED_SU1_2N,safSg=PROXIED_SG_2N,safApp=PROXIED_2N' Presence State UNINSTANTIATED => UNINSTANTIATED Sep 9 15:15:27 SLES-SLOT4 osafamfnd[25941]: NO 'safSu=PROXIED_SU1_2N,safSg=PROXIED_SG_2N,safApp=PROXIED_2N' Presence State UNINSTANTIATED => INSTANTIATING --- ** [tickets:#326] amf: proxied SU's presence state hangs at INSTANTIATING state.** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Fri May 24, 2013 09:34 AM UTC by Praveen **Last Updated:** Wed May 04, 2016 07:20 PM UTC **Owner:** nobody Migrated from http://devel.opensaf.org/ticket/2213. setup: 1 controller Model observed: TwoN Configuration of proxy : 1 App, 1SG, 1SU, 1 proxy comps Configuration of proxied : 1App, 1SG, 1SU, 1 proxied component with saAmfCtCompCategory=12 The proxy code is modelled to respond to amf with ERR_FAILED_OP inside SaAmfProxiedComponentInstantiateCallback?() api By default, the SU's of proxy and proxied are in locked-instantiation state. Scenario: Bringup the proxy and proxied configuration. Do unlock-in and unlock of the proxy. The proxy should be up and running, and the proxied registration should be successful. Now do unlock-in of proxied SU. The below is the console output console text: amf-adm unlock-in safSu=SU_pxd,safSg=SG_pxd,safApp=pxd_App error - saImmOmAdminOperationInvoke_2 FAILED: SA_AIS_ERR_TIMEOUT (5) Retrying again gives the below output. SLES11-SLOT-2:/home/surender/amf # amf-adm unlock-in safSu=SU_pxd,safSg=SG_pxd,safApp=pxd_App error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: SA_AIS_ERR_TRY_AGAIN (6) SLES11-SLOT-2:/home/surender/amf # amf-adm unlock-in safSu=SU_pxd,safSg=SG_pxd,safApp=pxd_App error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: SA_AIS_ERR_TRY_AGAIN (6) /var/log/messages output for above op's: Oct 11 15:13:15 SLES11-SLOT-2 osafamfnd[3852]: saAmfCtDefQuiescingCompleteTimeout for 'safVersion=4.0.0,safCompType=Comp_nored' initialized with saAmfCtDefCallbackTimeout Oct 11 15:13:15 SLES11-SLOT-2 osafamfnd[3852]: 'safSu=SU_mycomp,safSg=SG_mycomp,safApp=mycompApp' Presence State UNINSTANTIATED => INSTANTIATING Oct 11 15:13:16 SLES11-SLOT-2 osafamfnd[3852]: 'safSu=SU_mycomp,safSg=SG_mycomp,safApp=mycompApp' Presence State INSTANTIATING => INSTANTIATED Oct 11 15:13:16 SLES11-SLOT-2 osafamfnd[3852]: saAmfCtDefQuiescingCompleteTimeout for 'safVersion=4.0.0,safCompType=Comp_pxd_basetype' initialized with saAmfCtDefCallbackTimeout Oct 11 15:13:41 SLES11-SLOT-2 osafamfnd[3852]: 'safSu=SU_pxd,safSg=SG_pxd,safApp=pxd_App' Presence State UNINSTANTIATED => INSTANTIATING Oct 11 15:15:55 SLES11-SLOT-2 osafamfd[3711]: Admin operation is already going Oct 11 15:15:58 SLES11-SLOT-2 osafamfd[3711]: Admin operation is already going SU states of proxy and proxied: safSu=SU_mycomp,safSg=SG_mycomp,safApp=mycompApp saAmfSUAdminState=UNLOCKED(1) saAmfSUOperState=ENABLED(1) saAmfSUPresenceState=INSTANTIATED(3) saAmfSUReadinessState=IN-SERVICE(2) safSu=SU_pxd,safSg=SG_pxd,safApp=pxd_App saAmfSUAdminState=LOCKED(2) saAmfSUOperState=ENABLED(1) saAmfSUPresenceState=INSTANTIATING(2) saAmfSUReadinessState=OUT-OF-SERVICE(1) Comp state of proxy and proxied: safComp=mycomp,safSu=SU_mycomp,safSg=SG_mycomp,safApp=mycompApp saAmfCompOperState=ENABLED(1) saAmfCompPresenceState=INSTANTIATED(3) saAmfCompReadinessState=IN-SERVICE(
[tickets] [opensaf:tickets] #2009 AMF: App Si is moving to UNASSIGNED state after middleware failover
-> In addition to the steps mentioned in the ticket, for the below operations following message is printed in syslog. Sep 8 12:06:29 CONTROLLER-1 osafamfd[]: ER exec: create FAILED 12 Sep 8 12:06:35 CONTROLLER-1 osafamfd[]: ER exec: create FAILED 12 Sep 8 12:06:45 CONTROLLER-1 osafamfd[]: ER exec: create FAILED 12 Sep 8 12:06:55 CONTROLLER-1 osafamfd[]: ER exec: create FAILED 12 Below are the steps. -> Delete all the application objects. -> Perform the middleware switchover / failover. -> New active controller is trying to access the application SI object which is already deleted earlier. Sep 8 12:08:36.647738 osafamfd [:main.cc:0810] << process_event Sep 8 12:08:36.647743 osafamfd [:imm.cc:0396] >> execute Sep 8 12:08:36.647748 osafamfd [:imm.cc:0142] >> exec: Create safCsi=CSI1,safSi=TestApp_SI4,safApp=TestApp_TwoN Sep 8 12:08:36.647754 osafamfd [:imma_oi_api.c:2786] >> rt_object_create_common Sep 8 12:08:36.647761 osafamfd [:imma_oi_api.c:2892] TR attr:safCSIComp Sep 8 12:08:36.647768 osafamfd [:imma_oi_api.c:2892] TR attr:saAmfCSICompHAState Sep 8 12:08:36.647795 osafamfd [:imma_oi_api.c:2892] TR attr:saAmfCSICompHAReadinessState Sep 8 12:08:36.650289 osafamfd [:imma_oi_api.c:3063] << rt_object_create_common Sep 8 12:08:36.650330 osafamfd [:imm.cc:0163] ER exec: create FAILED 12 --- ** [tickets:#2009] AMF: App Si is moving to UNASSIGNED state after middleware failover** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Thu Sep 08, 2016 06:07 AM UTC by Srikanth R **Last Updated:** Thu Sep 08, 2016 06:09 AM UTC **Owner:** nobody Environment details -- OS : Suse 64bit Changeset : 7997 ( 5.1.FC) Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature enabled & no PBE ) AMF Application : 2N model with SUs mapped on PL-3,PL-4 ( si-si deps enabled) Summary : -- Application SIs are moving to UNASSIGNED state after middleware failover. Steps followed & Observed behaviour -- -> Initially brought up AMF application (2n model) on two payloads. -> All the SIs are fully assigned state and SUs are in INSERVICE state. -> Performed middleware failover. -> After standby became active controller, SIs moved to unassigned state. But 'amf-state siass' is showing proper output. -> Application received CSI remove callbacks after locking the SUs Expected behaviour -- -> As no fault happened on the application, SIs should not move to UNASSIGNED state for middleware failover. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2009 AMF: App Si is moving to UNASSIGNED state after middleware failover
amfd traces on both the controllers Attachments: - [2009.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/98b72c10/7108/attachment/2009.tgz) (849.1 kB; application/x-compressed-tar) --- ** [tickets:#2009] AMF: App Si is moving to UNASSIGNED state after middleware failover** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Thu Sep 08, 2016 06:07 AM UTC by Srikanth R **Last Updated:** Thu Sep 08, 2016 06:07 AM UTC **Owner:** nobody Environment details -- OS : Suse 64bit Changeset : 7997 ( 5.1.FC) Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature enabled & no PBE ) AMF Application : 2N model with SUs mapped on PL-3,PL-4 ( si-si deps enabled) Summary : -- Application SIs are moving to UNASSIGNED state after middleware failover. Steps followed & Observed behaviour -- -> Initially brought up AMF application (2n model) on two payloads. -> All the SIs are fully assigned state and SUs are in INSERVICE state. -> Performed middleware failover. -> After standby became active controller, SIs moved to unassigned state. But 'amf-state siass' is showing proper output. -> Application received CSI remove callbacks after locking the SUs Expected behaviour -- -> As no fault happened on the application, SIs should not move to UNASSIGNED state for middleware failover. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2009 AMF: App Si is moving to UNASSIGNED state after middleware failover
--- ** [tickets:#2009] AMF: App Si is moving to UNASSIGNED state after middleware failover** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Thu Sep 08, 2016 06:07 AM UTC by Srikanth R **Last Updated:** Thu Sep 08, 2016 06:07 AM UTC **Owner:** nobody Environment details -- OS : Suse 64bit Changeset : 7997 ( 5.1.FC) Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature enabled & no PBE ) AMF Application : 2N model with SUs mapped on PL-3,PL-4 ( si-si deps enabled) Summary : -- Application SIs are moving to UNASSIGNED state after middleware failover. Steps followed & Observed behaviour -- -> Initially brought up AMF application (2n model) on two payloads. -> All the SIs are fully assigned state and SUs are in INSERVICE state. -> Performed middleware failover. -> After standby became active controller, SIs moved to unassigned state. But 'amf-state siass' is showing proper output. -> Application received CSI remove callbacks after locking the SUs Expected behaviour -- -> As no fault happened on the application, SIs should not move to UNASSIGNED state for middleware failover. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1999 osafntfd on active controller crashed while logging to alarm stream
- **summary**: LOG : ntfd on active controller crashed while logging to alarm stream --> osafntfd on active controller crashed while logging to alarm stream - **Component**: log --> ntf - **Comment**: After the integration of LOG with CLM (#1638), all LOG clients should reinitialize after CLM unlock operation. It might be that , NTF as a LOG client is not reinitializing after CLM unlock and got the return value 31. --- ** [tickets:#1999] osafntfd on active controller crashed while logging to alarm stream** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Tue Sep 06, 2016 05:15 AM UTC by Srikanth R **Last Updated:** Tue Sep 06, 2016 08:09 AM UTC **Owner:** nobody Environment details -- OS : Suse 64bit Changeset : 7997 ( 5.1.FC) Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature disabled & no PBE ) AMF Application : 2N model with SUs mapped on PL-3,PL-4 Summary : -- NTFD crashed on active controller, while logging notification to alarm stream. Steps followed & Observed behaviour -- -> Initially performed couple of switchovers and tests on AMF application. -> Performed CLM lock operation of standby SC-1 and later unlocked. -> Performed switchover such that SC-1 became active controller. -> Stopped opensafd on PL-4. NTFD on active controller crashed. Sep 6 10:18:25 CONTROLLER-1 osafamfd[2262]: NO Node 'PL-4' left the cluster .. Sep 6 10:18:25 CONTROLLER-1 osafntfd[2242]: osaf_abort(31) called from 0x414d1e with errno=11 Sep 6 10:18:25 CONTROLLER-1 osafamfnd[2272]: NO 'safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' -> Below is the excerpt from the ntfd trace. Sep 6 10:18:25.436394 osafntfd [2242:NtfAdmin.cc:0252] T2 New notification received, id: 682 Sep 6 10:18:25.436398 osafntfd [2242:NtfAdmin.cc:0187] >> processNotification Sep 6 10:18:25.436404 osafntfd [2242:NtfNotification.cc:0045] T3 constructor 0x685790, notId: 682 Sep 6 10:18:25.436409 osafntfd [2242:ntfsv_mem.c:0761] >> ntfsv_get_ntf_header Sep 6 10:18:25.436412 osafntfd [2242:ntfsv_mem.c:0782] << ntfsv_get_ntf_header Sep 6 10:18:25.436425 osafntfd [2242:NtfAdmin.cc:0200] T2 notification 682 with type 16384 added, notificationMap size is 1 Sep 6 10:18:25.436431 osafntfd [2242:NtfLogger.cc:0130] >> log Sep 6 10:18:25.436435 osafntfd [2242:NtfLogger.cc:0132] T2 notification Id=682 received in logger with size 0 Sep 6 10:18:25.436439 osafntfd [2242:NtfLogger.cc:0135] T2 IS LOCAL, logging Sep 6 10:18:25.436442 osafntfd [2242:NtfLogger.cc:0166] >> checkQueueAndLog Sep 6 10:18:25.436447 osafntfd [2242:NtfLogger.cc:0196] >> logNotification Sep 6 10:18:25.436452 osafntfd [2242:ntfsv_mem.c:0761] >> ntfsv_get_ntf_header Sep 6 10:18:25.436455 osafntfd [2242:ntfsv_mem.c:0782] << ntfsv_get_ntf_header Sep 6 10:18:25.436460 osafntfd [2242:NtfLogger.cc:0231] T2 Logging notification to alarm stream Sep 6 10:18:25.436495 osafntfd [2242:lga_api.c:1151] >> saLogWriteLogAsync Sep 6 10:18:25.436500 osafntfd [2242:lga_api.c:1015] >> handle_log_record Sep 6 10:18:25.436507 osafntfd [2242:lga_api.c:1110] << handle_log_record Sep 6 10:18:25.436518 osafntfd [2242:lga_api.c:1229] TR **saLogWriteLogAsync Node not CLM member or stale client** Sep 6 10:18:25.436524 osafntfd [2242:lga_api.c:1320] << saLogWriteLogAsync Sep 6 10:18:42.472616 osafntfd [2176:ntfs_main.c:0181] >> initialize --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2002 CLM : Agent crashed for invalid check in buffer notification parameter
--- ** [tickets:#2002] CLM : Agent crashed for invalid check in buffer notification parameter** **Status:** unassigned **Milestone:** 5.1.RC1 **Created:** Tue Sep 06, 2016 08:15 AM UTC by Srikanth R **Last Updated:** Tue Sep 06, 2016 08:15 AM UTC **Owner:** nobody Environment details -- OS : Suse 64bit Changeset : 7997 ( 5.1.FC) Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature disabled & no PBE ) AMF Application : 2N model with SUs mapped on PL-3,PL-4 Steps followed & Observed behaviour -- -> Call saClmClusterTrack_4 api with CURRENT flag and buffer parameter populated. Here the buffer paramter is populated by allocating suffiicent memory of numberOfItems but notification is having garbage values. Agent crashed with the following back trace, if notification is having garbage values. -> #3 0x7f4ccb370c9f in osaf_extended_name_length (name=0x9d5e4e) at osaf_extended_name.c:139 -> #4 0x7f4cca9ff27c in clma_validate_flags_buf_4 (hdl_rec=0x97cbc0, flags=1 '\001', buf=0x97c190) at clma_api.c:183 ->#5 0x7f4ccaa00fe5 in clmaclustertrack (clmHandle=4290772993, flags=1 '\001', buf=0x0, buf_4=0x97c190) at clma_api.c:1032 ->#6 0x7f4ccaa00d40 in saClmClusterTrack_4 (clmHandle=4290772993, flags=1 '\001', buf=0x97c190) at clma_api.c:958 Expected behaviour -- If the buffer parameter is NULL, CLM shall invoke a callback. If the buffer parameter is not NULL, CLM should check only value of numberOfItems and evaluate whether sufficient memory is allocated by user or not. With the #1906 changes, contents of notification are also verified. But only structure member numberOfItems is to be verified. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1999 LOG : ntfd on active controller crashed while logging to alarm stream
--- ** [tickets:#1999] LOG : ntfd on active controller crashed while logging to alarm stream** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Tue Sep 06, 2016 05:15 AM UTC by Srikanth R **Last Updated:** Tue Sep 06, 2016 05:15 AM UTC **Owner:** nobody Environment details -- OS : Suse 64bit Changeset : 7997 ( 5.1.FC) Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature disabled & no PBE ) AMF Application : 2N model with SUs mapped on PL-3,PL-4 Summary : -- NTFD crashed on active controller, while logging notification to alarm stream. Steps followed & Observed behaviour -- -> Initially performed couple of switchovers and tests on AMF application. -> Performed CLM lock operation of standby SC-1 and later unlocked. -> Performed switchover such that SC-1 became active controller. -> Stopped opensafd on PL-4. NTFD on active controller crashed. Sep 6 10:18:25 CONTROLLER-1 osafamfd[2262]: NO Node 'PL-4' left the cluster .. Sep 6 10:18:25 CONTROLLER-1 osafntfd[2242]: osaf_abort(31) called from 0x414d1e with errno=11 Sep 6 10:18:25 CONTROLLER-1 osafamfnd[2272]: NO 'safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' -> Below is the excerpt from the ntfd trace. Sep 6 10:18:25.436394 osafntfd [2242:NtfAdmin.cc:0252] T2 New notification received, id: 682 Sep 6 10:18:25.436398 osafntfd [2242:NtfAdmin.cc:0187] >> processNotification Sep 6 10:18:25.436404 osafntfd [2242:NtfNotification.cc:0045] T3 constructor 0x685790, notId: 682 Sep 6 10:18:25.436409 osafntfd [2242:ntfsv_mem.c:0761] >> ntfsv_get_ntf_header Sep 6 10:18:25.436412 osafntfd [2242:ntfsv_mem.c:0782] << ntfsv_get_ntf_header Sep 6 10:18:25.436425 osafntfd [2242:NtfAdmin.cc:0200] T2 notification 682 with type 16384 added, notificationMap size is 1 Sep 6 10:18:25.436431 osafntfd [2242:NtfLogger.cc:0130] >> log Sep 6 10:18:25.436435 osafntfd [2242:NtfLogger.cc:0132] T2 notification Id=682 received in logger with size 0 Sep 6 10:18:25.436439 osafntfd [2242:NtfLogger.cc:0135] T2 IS LOCAL, logging Sep 6 10:18:25.436442 osafntfd [2242:NtfLogger.cc:0166] >> checkQueueAndLog Sep 6 10:18:25.436447 osafntfd [2242:NtfLogger.cc:0196] >> logNotification Sep 6 10:18:25.436452 osafntfd [2242:ntfsv_mem.c:0761] >> ntfsv_get_ntf_header Sep 6 10:18:25.436455 osafntfd [2242:ntfsv_mem.c:0782] << ntfsv_get_ntf_header Sep 6 10:18:25.436460 osafntfd [2242:NtfLogger.cc:0231] T2 Logging notification to alarm stream Sep 6 10:18:25.436495 osafntfd [2242:lga_api.c:1151] >> saLogWriteLogAsync Sep 6 10:18:25.436500 osafntfd [2242:lga_api.c:1015] >> handle_log_record Sep 6 10:18:25.436507 osafntfd [2242:lga_api.c:1110] << handle_log_record Sep 6 10:18:25.436518 osafntfd [2242:lga_api.c:1229] TR **saLogWriteLogAsync Node not CLM member or stale client** Sep 6 10:18:25.436524 osafntfd [2242:lga_api.c:1320] << saLogWriteLogAsync Sep 6 10:18:42.472616 osafntfd [2176:ntfs_main.c:0181] >> initialize --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1995 AMF : amfd crashed while dumping AMF state
--- ** [tickets:#1995] AMF : amfd crashed while dumping AMF state** **Status:** unassigned **Milestone:** 5.1.RC1 **Created:** Fri Sep 02, 2016 08:42 AM UTC by Srikanth R **Last Updated:** Fri Sep 02, 2016 08:42 AM UTC **Owner:** nobody Changeset : 7997 5.1 FC AMFD crashed while dumping the amf state, with the following command. immadm -a @safAmfService2020f -o 99 @safAmfService2020f Sep 2 12:51:26 CONTROLLER-2 osafamfd[2691]: NO unknown type: @safAmfService2020f Sep 2 12:51:26 CONTROLLER-2 osafamfd[2691]: imm.cc:648: object_name_to_class_type: Assertion 'false' failed. Sep 2 12:51:26 CONTROLLER-2 osafamfnd[2701]: WA AMF director unexpectedly crashed Sep 2 12:51:26 CONTROLLER-2 osafamfnd[2701]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, OwnNodeId = 131599, SupervisionTime = 60 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1991 AMF: Existing PG tracking should not be stopped for CURRENT flag
--- ** [tickets:#1991] AMF: Existing PG tracking should not be stopped for CURRENT flag** **Status:** unassigned **Milestone:** 5.1.RC1 **Created:** Wed Aug 31, 2016 09:44 AM UTC by Srikanth R **Last Updated:** Wed Aug 31, 2016 09:44 AM UTC **Owner:** nobody 5.1.FC : changeset - 6997 Issue : Existing PG tracking should not be stopped for CURRENT call Steps performed : -> Call saAmfInitialize_4() -> Call saAmfProtectionGroupTrack_4() with SA_TRACK_CURRENT flag. -> Call saAmfProtectionGroupTrack_4() with SA_TRACK_CHANGES flag. -> Call saAmfProtectionGroupTrack_4() with SA_TRACK_CURRENT flag. -> Call saAmfProtectionGroupTrackStop() Observed output : TrackStop returns ERR_NOT_EXIST, indicating that tracking is not started earlier. Expected output: TrackStop() api should return SA_AIS_OK and in the earlier release, api is returning SA_AIS_OK. According to the B04.01 spec 7.11.1 page 318 , Tracking should not be stopped untill TrackStop() is called explicitly. Once saAmfProtectionGroupTrack_4() has been called with trackFlags containing either SA_TRACK_CHANGES or SA_TRACK_CHANGES_ONLY, notification callbacks can only be stopped by an invocation of saAmfProtectionGroupTrackStop(). --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1990 AMF : Extra notification is received for lock operation on unlocked SG.
--- ** [tickets:#1990] AMF : Extra notification is received for lock operation on unlocked SG.** **Status:** unassigned **Milestone:** 5.1.RC1 **Created:** Wed Aug 31, 2016 06:40 AM UTC by Srikanth R **Last Updated:** Wed Aug 31, 2016 06:40 AM UTC **Owner:** nobody Changeset : 5.1 FC (7997 changeset) Extra notification is received for lock operation on unlocked SG. amf-adm lock safSg=AmfDemo,safApp=AmfDemo === Aug 30 15:22:27 - State Change === eventType = SA_NTF_OBJECT_STATE_CHANGE notificationObject = "safSg=AmfDemo,safApp=AmfDemo" notifyingObject = "safApp=safAmfService" notificationClassId = SA_NTF_VENDOR_ID_SAF.SA_SVC_AMF.103 (0x67) additionalText = "Admin state of safSg=AmfDemo,safApp=AmfDemo changed" sourceIndicator = SA_NTF_MANAGEMENT_OPERATION State ID = SA_AMF_ADMIN_STATE Old State: SA_AMF_ADMIN_UNLOCKED New State: SA_AMF_ADMIN_LOCKED === Aug 30 15:22:27 - State Change === eventType = SA_NTF_OBJECT_STATE_CHANGE notificationObject = "safSg=AmfDemo,safApp=AmfDemo" notifyingObject = "safApp=safAmfService" notificationClassId = SA_NTF_VENDOR_ID_SAF.SA_SVC_AMF.103 (0x67) additionalText = "Admin state of safSg=AmfDemo,safApp=AmfDemo changed" sourceIndicator = SA_NTF_MANAGEMENT_OPERATION State ID = SA_AMF_ADMIN_STATE Old State: SA_AMF_ADMIN_LOCKED New State: SA_AMF_ADMIN_LOCKED --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1926 pyosaf: utils/ntf fails to set additional text
- **status**: review --> fixed - **Milestone**: 5.0.1 --> 5.1.FC - **Comment**: changeset: 7968:7e5ae40512d1 tag: tip user:Johan MÃ¥rtensson date:Mon Aug 29 17:02:21 2016 +0530 summary: pyosaf: Fix handling of additionalText field in notification headers [#1926] --- ** [tickets:#1926] pyosaf: utils/ntf fails to set additional text** **Status:** fixed **Milestone:** 5.1.FC **Created:** Thu Jul 21, 2016 08:08 AM UTC by Johan MÃ¥rtensson **Last Updated:** Thu Jul 21, 2016 08:37 AM UTC **Owner:** Johan MÃ¥rtensson Additional text is not set correctly in the notification header. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1951 Opensaf support for TIPC as built in module
--- ** [tickets:#1951] Opensaf support for TIPC as built in module** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Mon Aug 15, 2016 07:00 AM UTC by Srikanth R **Last Updated:** Mon Aug 15, 2016 07:00 AM UTC **Owner:** nobody Setup : Montavista linux with 5.0 GA Opensafd fails to start with TIPC as built in kernel module. Opensafd on other operating systems like SUSE, OEL successfully starts , by dynamically loading kernel module. In the file configure_tipc, it is assumed that if TIPC is already loaded in kernel then TIPC configuration is done earlier to the opensafd start. But for an operating system with TIPC as built in module, opensafd fails with the following output TIPC node address not configured to OpenSAF requirements, exiting.. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. http://sdm.link/zohodev2dev___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1890 Doc : Headless feature documentation
--- ** [tickets:#1890] Doc : Headless feature documentation ** **Status:** unassigned **Milestone:** 5.0.1 **Created:** Tue Jun 21, 2016 11:02 AM UTC by Srikanth R **Last Updated:** Tue Jun 21, 2016 11:02 AM UTC **Owner:** nobody Version : Opensaf 5.0. GA 1) Documentation about headless feature should be updated in Opensaf_Overview_PR.odt / Opensaf_Extentsions. The documentation should list out services which provide functionality, when the cluster goes headless. 2) The README.HYDRA file in the ntfsv folder should be renamed to README.HEADLESS for uniformity in naming the files across all the folders. 3) CLM folder doesn't have README for the headless feature. 4) The headless files across all folders should have same naming convention. ./osaf/services/saf/amf/README_HEADLESS ./osaf/services/saf/logsv/README-HEADLESS ./osaf/services/saf/cpsv/README.HEADLESS --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San Francisco, CA to explore cutting-edge tech and listen to tech luminaries present their vision of the future. This family event has something for everyone, including kids. Get more information and register today. http://sdm.link/attshape___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1725 AMF: Recover transient SUSIs left over from headless
For a fault during headless, AMF is leaving the application in the same state with the following update in syslog on SU hosted payload. Sep 7 19:38:39 SCALE_SLOT-94 osafamfnd[5104]: CR SU-SI record addition failed, SU= safSu=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_TwoN : SI=safSi=TestApp_SI1,safApp=TestApp_TwoN Sep 7 19:38:39 SCALE_SLOT-94 osafamfnd[5104]: CR SU-SI record addition failed, SU= safSu=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_TwoN : SI=safSi=TestApp_SI2,safApp=TestApp_TwoN Sep 7 19:38:39 SCALE_SLOT-94 osafamfnd[5104]: CR SU-SI record addition failed, SU= safSu=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_TwoN : SI=safSi=TestApp_SI3,safApp=TestApp_TwoN Sep 7 19:38:39 SCALE_SLOT-94 osafamfnd[5104]: CR SU-SI record addition failed, SU= safSu=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_TwoN : SI=safSi=TestApp_SI4,safApp=TestApp_TwoN In the above situation, application with active assignment faulted during headless and node went for reboot. Once the controller joins , the above syslog is printed and the application is left with ONLY standby assignment. If AMF application is left with improper assignments and this ticket is targeting the above scenario and others like #1869, then this ticket should be marked as **defect**. --- ** [tickets:#1725] AMF: Recover transient SUSIs left over from headless** **Status:** accepted **Milestone:** 5.1.FC **Created:** Wed Apr 06, 2016 07:16 AM UTC by Minh Hon Chau **Last Updated:** Thu May 05, 2016 12:22 PM UTC **Owner:** Minh Hon Chau This ticket is more likely an enhancement that targets on how AMFD detect and recover the transients SUSI left over from headless. There are three major situations: (1) - Cluster goes headless, su/node failover on any payloads can happen, then cluster recover (2) - issue admin op on any AMF entities, cluster goes headless. During headless, the middle HA assignments of whole admin op sequence between AMFND and components could be: (2.1) The assignment completes, component returns OK with csi callback, then cluster recover (2.2) The assignment is under going, then cluster recover. The assignment afterward could complete, or csi callback returns FAILED_OPERATION or error can also happen At the time cluster recover, amfd has collected all assignments from all amfnd(s). These assignments can be in assigned or assigning states whilst its HA states do not conform its SG redundancy. Any of (1) (2.1) (2.2) can happen in a combination, which means while issuing admin op (2), cluster go headless and any kinds of failover (1) can happen during headless. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San Francisco, CA to explore cutting-edge tech and listen to tech luminaries present their vision of the future. This family event has something for everyone, including kids. Get more information and register today. http://sdm.link/attshape___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1886 CLM : Initialize API returns 31, once controllers join back from headless
--- ** [tickets:#1886] CLM : Initialize API returns 31, once controllers join back from headless** **Status:** unassigned **Milestone:** 5.0.1 **Created:** Mon Jun 20, 2016 10:32 AM UTC by Srikanth R **Last Updated:** Mon Jun 20, 2016 10:32 AM UTC **Owner:** nobody **Attachments:** - [clmd_1888](https://sourceforge.net/p/opensaf/tickets/1886/attachment/clmd_1888) (490.9 kB; application/octet-stream) setup: Version - opensaf 5.0.GA 5-Node cluster( 3 controllers and PL:4,PL-5 Payloads) Create headless scenario and call saClmInitialize_4 api on a healthy payload PL-5. The saClmInitialize_4 should return TRY_AGAIN untill the controllers are up. In some cases, 31 return code is returned. This is observed 2 out of 5 times. MBCSV:MBCA:OFF Aug 2 20:56:58.379408 clma [13326:clma_mds.c:1124] >> clma_mds_init Aug 2 20:56:58.379582 clma [13326:clma_mds.c:1170] << clma_mds_init Aug 2 20:57:26.122341 clma [13326:clma_mds.c:0947] T2 CLMA Rcvd MDS subscribe evt from svc 34 Aug 2 20:57:26.122361 clma [13326:clma_mds.c:0978] T2 MSG from CLMS NCSMDS_NEW_ACTIVE/UP Aug 2 20:57:26.123651 clma [13326:clma_util.c:0120] << clma_startup: rc: 1, clma_use_count: 1 Aug 2 20:57:26.123665 clma [13326:clma_mds.c:1227] >> clma_mds_msg_sync_send Aug 2 20:57:26.123704 clma [13326:clma_mds.c:0317] >> clma_mds_enc Aug 2 20:57:26.123717 clma [13326:clma_mds.c:0352] T2 msgtype: 0 Aug 2 20:57:26.123723 clma [13326:clma_mds.c:0366] T2 api_info.type: 0 Aug 2 20:57:26.123729 clma [13326:clma_mds.c:0045] >> clma_enc_initialize_msg Aug 2 20:57:26.123735 clma [13326:clma_mds.c:0060] << clma_enc_initialize_msg Aug 2 20:57:26.123742 clma [13326:clma_mds.c:0407] << clma_mds_enc Aug 2 20:57:26.152653 clma [13326:clma_mds.c:0697] >> clma_mds_dec Aug 2 20:57:26.152674 clma [13326:clma_mds.c:0729] T2 CLMSV_CLMA_API_RESP_MSG rc = 31 Aug 2 20:57:26.152682 clma [13326:clma_mds.c:0809] << clma_mds_dec Aug 2 20:57:26.152717 clma [13326:clma_mds.c:1253] << clma_mds_msg_sync_send Aug 2 20:57:26.152728 clma [13326:clma_api.c:0636] TR CLMS return FAILED Aug 2 20:57:26.152752 clma [13326:clma_util.c:0656] >> clma_msg_destroy Aug 2 20:57:26.153200 clma [13326:clma_util.c:0680] << clma_msg_destroy Aug 2 20:57:26.153219 clma [13326:clma_api.c:0663] T2 CLMA INIT FAILED Aug 2 20:57:26.153226 clma [13326:clma_util.c:0133] >> clma_shutdown: clma_use_count: 1 Aug 2 20:57:26.153232 clma [13326:clma_mds.c:1190] >> clma_mds_finalize Aug 2 20:57:26.153412 clma [13326:clma_mds.c:1203] << clma_mds_finalize Aug 2 20:57:26.153580 clma [13326:sysf_def.c:0153] TR DESTROYING LEAP ENVIRONMENT Aug 2 20:57:26.153663 clma [13326:sysf_def.c:0170] TR DONE DESTROYING LEAP ENVIRONMENT Aug 2 20:57:26.153679 clma [13326:clma_util.c:0146] << clma_shutdown: rc: 1, clma_use_count: 0 Aug 2 20:57:26.153686 clma [13326:clma_api.c:0668] << clmainitialize Aug 2 20:57:26.153692 clma [13326:clma_api.c:0580] << saClmInitialize_4 CLM director trace on new active controller is attached. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. http://sdm.link/zohomanageengine___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1885 CLM : LIbrary gives false success for couple of APIs , once controller joins back from headless
CLM agent trace Attachments: - [clma_agent.txt](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/b0de1c47/b115/attachment/clma_agent.txt) (10.5 kB; text/plain) --- ** [tickets:#1885] CLM : LIbrary gives false success for couple of APIs , once controller joins back from headless** **Status:** unassigned **Milestone:** 5.0.1 **Created:** Mon Jun 20, 2016 09:17 AM UTC by Srikanth R **Last Updated:** Mon Jun 20, 2016 09:17 AM UTC **Owner:** nobody Setup : 5 nodes setup with 3 controllers. Version : opensaf 5.0 GA Steps performed : -> Invoke saClmInitialize_4 -> Create a thread by calling saClmDispatch with DISPATCH_BLOCKING as argument. -> Invoke saClmClusterNodeGet_4 -> Create headless state. -> Invoke saClmClusterTrack_4 with TRACK_CURRENT and TRACK_START_STEP -> Invoke saClmClusterNodeGet_4 Observed behavior : The first three apis successfully returned SA_AIS_OK. Once the headless scenario is induced, saClmClusterTrack_4 api returned TRY_AGAIN until one of the controller joined as active controller. Here the api returned SA_AIS_OK, but no callback with CURRENT nodes info is delivered. The thread in which Dispatch is called, returned with SA_AIS_OK. Even though internally, the handle is marked as BAD_HANDLE. The subsequent calls to saClmClusterTrack and saClmClusterNodeGet_4 returned successfully. Aug 2 19:36:22.990719 clma [10058:clma_api.c:1035] TR RC before give handle flagsTrack 6 Aug 2 19:36:22.990730 clma [10058:clma_api.c:1038] << clmaclustertrack Aug 2 19:36:22.990740 clma [10058:clma_api.c:0938] << saClmClusterTrack_4 Aug 2 19:36:26.998572 clma [10058:clma_api.c:0934] >> saClmClusterTrack_4 Aug 2 19:36:26.998625 clma [10058:clma_api.c:0968] >> clmaclustertrack Aug 2 19:36:26.998636 clma [10058:clma_api.c:0986] TR CLMS down Aug 2 19:36:26.998642 clma [10058:clma_api.c:1035] TR RC before give handle flagsTrack 6 Aug 2 19:36:26.998648 clma [10058:clma_api.c:1038] << clmaclustertrack Aug 2 19:36:26.998657 clma [10058:clma_api.c:0938] << saClmClusterTrack_4 Aug 2 19:36:30.837965 clma [10058:clma_mds.c:0947] T2 CLMA Rcvd MDS subscribe evt from svc 34 Aug 2 19:36:30.837983 clma [10058:clma_mds.c:0978] T2 MSG from CLMS NCSMDS_NEW_ACTIVE/UP Aug 2 19:36:30.837989 clma [10058:clma_mds.c:0989] TR ** Marking handle as BAD** Aug 2 19:36:30.839058 clma [10058:sysf_ipc.c:0363] TR IN LEAP_DBG_SINK Aug 2 19:36:30.839070 clma [10058:clma_util.c:0625] << clma_hdl_cbk_dispatch Aug 2 19:36:30.839076 clma [10058:clma_api.c:0793] << saClmDispatch Aug 2 19:36:31.259065 clma [10058:clma_api.c:0934] >> saClmClusterTrack_4 Aug 2 19:36:31.259088 clma [10058:clma_api.c:0968] >> clmaclustertrack Aug 2 19:36:31.259097 clma [10058:clma_util.c:0036] >> clma_validate_version Aug 2 19:36:31.259103 clma [10058:clma_util.c:0042] << clma_validate_version Aug 2 19:36:31.259108 clma [10058:clma_api.c:1009] TR B.4.1 version Aug 2 19:36:31.259113 clma [10058:clma_api.c:0140] >> clma_validate_flags_buf_4: flags=0x15 Aug 2 19:36:31.259118 clma [10058:clma_api.c:0176] << clma_validate_flags_buf_4 Aug 2 19:36:31.259124 clma [10058:clma_api.c:1020] TR RC after validate flagsTrack 1 Aug 2 19:36:31.259129 clma [10058:clma_util.c:0036] >> clma_validate_version Aug 2 19:36:31.259140 clma [10058:clma_util.c:0042] << clma_validate_version Aug 2 19:36:31.259145 clma [10058:clma_mds.c:1274] >> clma_mds_msg_async_send Aug 2 19:36:31.259158 clma [10058:clma_mds.c:0317] >> clma_mds_enc Aug 2 19:36:31.259166 clma [10058:clma_mds.c:0352] T2 msgtype: 0 Aug 2 19:36:31.259171 clma [10058:clma_mds.c:0366] T2 api_info.type: 2 Aug 2 19:36:31.259177 clma [10058:clma_mds.c:0118] >> clma_enc_track_start_msg Aug 2 19:36:31.259182 clma [10058:clma_mds.c:0134] << clma_enc_track_start_msg Aug 2 19:36:31.259187 clma [10058:clma_mds.c:0407] << clma_mds_enc Aug 2 19:36:31.259260 clma [10058:clma_mds.c:1296] << clma_mds_msg_async_send Aug 2 19:36:31.259272 clma [10058:clma_api.c:0455] << clma_send_md If Dispatch api is called once AGAIN after the controller joins , BAD_HANDLE is returned. Expected behavior : If the handle is marked as BAD internally, the apis saClmClusterTrack_4 and saClmClusterNodeGet_4 should also return BAD_HANDLE once the controller joins back. Currently Dispatch returns BAD_HANDLE --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an inte
[tickets] [opensaf:tickets] #1885 CLM : LIbrary gives false success for couple of APIs , once controller joins back from headless
--- ** [tickets:#1885] CLM : LIbrary gives false success for couple of APIs , once controller joins back from headless** **Status:** unassigned **Milestone:** 5.0.1 **Created:** Mon Jun 20, 2016 09:17 AM UTC by Srikanth R **Last Updated:** Mon Jun 20, 2016 09:17 AM UTC **Owner:** nobody Setup : 5 nodes setup with 3 controllers. Version : opensaf 5.0 GA Steps performed : -> Invoke saClmInitialize_4 -> Create a thread by calling saClmDispatch with DISPATCH_BLOCKING as argument. -> Invoke saClmClusterNodeGet_4 -> Create headless state. -> Invoke saClmClusterTrack_4 with TRACK_CURRENT and TRACK_START_STEP -> Invoke saClmClusterNodeGet_4 Observed behavior : The first three apis successfully returned SA_AIS_OK. Once the headless scenario is induced, saClmClusterTrack_4 api returned TRY_AGAIN until one of the controller joined as active controller. Here the api returned SA_AIS_OK, but no callback with CURRENT nodes info is delivered. The thread in which Dispatch is called, returned with SA_AIS_OK. Even though internally, the handle is marked as BAD_HANDLE. The subsequent calls to saClmClusterTrack and saClmClusterNodeGet_4 returned successfully. Aug 2 19:36:22.990719 clma [10058:clma_api.c:1035] TR RC before give handle flagsTrack 6 Aug 2 19:36:22.990730 clma [10058:clma_api.c:1038] << clmaclustertrack Aug 2 19:36:22.990740 clma [10058:clma_api.c:0938] << saClmClusterTrack_4 Aug 2 19:36:26.998572 clma [10058:clma_api.c:0934] >> saClmClusterTrack_4 Aug 2 19:36:26.998625 clma [10058:clma_api.c:0968] >> clmaclustertrack Aug 2 19:36:26.998636 clma [10058:clma_api.c:0986] TR CLMS down Aug 2 19:36:26.998642 clma [10058:clma_api.c:1035] TR RC before give handle flagsTrack 6 Aug 2 19:36:26.998648 clma [10058:clma_api.c:1038] << clmaclustertrack Aug 2 19:36:26.998657 clma [10058:clma_api.c:0938] << saClmClusterTrack_4 Aug 2 19:36:30.837965 clma [10058:clma_mds.c:0947] T2 CLMA Rcvd MDS subscribe evt from svc 34 Aug 2 19:36:30.837983 clma [10058:clma_mds.c:0978] T2 MSG from CLMS NCSMDS_NEW_ACTIVE/UP Aug 2 19:36:30.837989 clma [10058:clma_mds.c:0989] TR ** Marking handle as BAD** Aug 2 19:36:30.839058 clma [10058:sysf_ipc.c:0363] TR IN LEAP_DBG_SINK Aug 2 19:36:30.839070 clma [10058:clma_util.c:0625] << clma_hdl_cbk_dispatch Aug 2 19:36:30.839076 clma [10058:clma_api.c:0793] << saClmDispatch Aug 2 19:36:31.259065 clma [10058:clma_api.c:0934] >> saClmClusterTrack_4 Aug 2 19:36:31.259088 clma [10058:clma_api.c:0968] >> clmaclustertrack Aug 2 19:36:31.259097 clma [10058:clma_util.c:0036] >> clma_validate_version Aug 2 19:36:31.259103 clma [10058:clma_util.c:0042] << clma_validate_version Aug 2 19:36:31.259108 clma [10058:clma_api.c:1009] TR B.4.1 version Aug 2 19:36:31.259113 clma [10058:clma_api.c:0140] >> clma_validate_flags_buf_4: flags=0x15 Aug 2 19:36:31.259118 clma [10058:clma_api.c:0176] << clma_validate_flags_buf_4 Aug 2 19:36:31.259124 clma [10058:clma_api.c:1020] TR RC after validate flagsTrack 1 Aug 2 19:36:31.259129 clma [10058:clma_util.c:0036] >> clma_validate_version Aug 2 19:36:31.259140 clma [10058:clma_util.c:0042] << clma_validate_version Aug 2 19:36:31.259145 clma [10058:clma_mds.c:1274] >> clma_mds_msg_async_send Aug 2 19:36:31.259158 clma [10058:clma_mds.c:0317] >> clma_mds_enc Aug 2 19:36:31.259166 clma [10058:clma_mds.c:0352] T2 msgtype: 0 Aug 2 19:36:31.259171 clma [10058:clma_mds.c:0366] T2 api_info.type: 2 Aug 2 19:36:31.259177 clma [10058:clma_mds.c:0118] >> clma_enc_track_start_msg Aug 2 19:36:31.259182 clma [10058:clma_mds.c:0134] << clma_enc_track_start_msg Aug 2 19:36:31.259187 clma [10058:clma_mds.c:0407] << clma_mds_enc Aug 2 19:36:31.259260 clma [10058:clma_mds.c:1296] << clma_mds_msg_async_send Aug 2 19:36:31.259272 clma [10058:clma_api.c:0455] << clma_send_md If Dispatch api is called once AGAIN after the controller joins , BAD_HANDLE is returned. Expected behavior : If the handle is marked as BAD internally, the apis saClmClusterTrack_4 and saClmClusterNodeGet_4 should also return BAD_HANDLE once the controller joins back. Currently Dispatch returns BAD_HANDLE --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make info
[tickets] [opensaf:tickets] #1869 AMF: SG in unstable for SI lock operation, after HEADLESS
--- ** [tickets:#1869] AMF: SG in unstable for SI lock operation, after HEADLESS ** **Status:** unassigned **Milestone:** 5.0.1 **Created:** Thu Jun 09, 2016 07:24 AM UTC by Srikanth R **Last Updated:** Thu Jun 09, 2016 07:24 AM UTC **Owner:** nobody Opensaf version : 5.0. GA. Setup : 5 nodes with 3 controllers. PL-5, PL-4 hosted active, standby assignments for application 2n SU. where as SC-3 hosted spare SU. Steps performed: -> Brought down all the controllers and headless scenario is created. ->Now stopped opensafd on PL-5, where SU2 is hosting active assignment -> SU1 on PL-4 did not get active assignment. It remained in standby assignment. -> After the controllers joined back the cluster, following is the error message printed on the PL-4. Aug 26 17:21:34 SCALE_SLOT-94 osafamfnd[19347]: CR SU-SI record addition failed, SU= safSu=SU1,safSg=AmfDemo,safApp=AmfDemo : SI=safSi=AmfDemo,safApp=AmfDemo SCALE_SLOT-94:~ # immlist safSi=AmfDemo,safApp=AmfDemo Name Type Value(s) saAmfSIPrefStandbyAssignments SA_UINT32_T 1 (0x1) saAmfSIPrefActiveAssignments SA_UINT32_T 1 (0x1) saAmfSINumCurrStandbyAssignments SA_UINT32_T 2 (0x2) saAmfSINumCurrActiveAssignmentsSA_UINT32_T 0 (0x0) saAmfSIAssignmentState SA_UINT32_T 3 (0x3) -> Lock operation on SI resulted in SG unstable operation. 46 04:30:46 07/23/2016 NO safApp=safAmfService "Cluster startup timeout, assigning SIs to SUs" 47 04:30:46 07/23/2016 NO safApp=safAmfService "safSi=AmfDemo,safApp=AmfDemo assigned to safSu=SU1,safSg=AmfDemo,safApp=AmfDemo HA State 'STANDBY'" 48 04:30:46 07/23/2016 NO safApp=safAmfService "Autorepair not done for 'safSu=SC-3,safSg=2N,safApp=OpenSAF'" 49 04:30:46 07/23/2016 NO safApp=safAmfService "Autorepair not done for 'safSu=SU3_Spare,safSg=AmfDemo,safApp=AmfDemo'" 50 07:49:21 07/23/2016 NO safApp=safAmfService "Admin op "LOCK" initiated for 'safSi=AmfDemo,safApp=AmfDemo', invocation: 219043332097" 51 07:49:21 07/23/2016 NO safApp=safAmfService "Admin op invocation: 219043332097, err: 'SI lock of 'safSi=AmfDemo,safApp=AmfDemo' failed, SG not stable'" 52 07:49:21 07/23/2016 NO safApp=safAmfService "Admin op done for invocation: 219043332097, result 6" 53 07:49:22 07/23/2016 NO safApp=safAmfService "Admin op "LOCK" initiated for 'safSi=AmfDemo,safApp=AmfDemo', invocation: 2190 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1867 HEADLESS : Payloads went for reboot, in headless state as CPSV got TIMEOUT rc for CLM API
--- ** [tickets:#1867] HEADLESS : Payloads went for reboot, in headless state as CPSV got TIMEOUT rc for CLM API** **Status:** unassigned **Milestone:** 5.0.1 **Created:** Wed Jun 08, 2016 10:54 AM UTC by Srikanth R **Last Updated:** Wed Jun 08, 2016 10:54 AM UTC **Owner:** nobody Version : Opensaf 5.0. GA Setup : Two payloads with three controllers. Steps performed : -> Initially all the nodes are part of the cluster. -> Induced failover by bringing down active, standby and spare in the order. Aug 7 20:30:08 SCALE_SLOT-94 kernel: [5993776.936794] TIPC: Lost contact with <1.1.1> Aug 7 20:30:08 SCALE_SLOT-94 osafimmnd[2748]: NO Sleep done registering IMMND with MDS Aug 7 20:30:08 SCALE_SLOT-94 osafimmnd[2748]: NO MDS: mds_register_callback: dest 2040fa5bb6016 already exist Aug 7 20:30:08 SCALE_SLOT-94 osafimmnd[2748]: NO SUCCESS IN REGISTERING IMMND WITH MDS Aug 7 20:30:08 SCALE_SLOT-94 osafimmnd[2748]: NO Re-introduce-me highestProcessed:6859 highestReceived:6859 Aug 7 20:30:13 SCALE_SLOT-94 osafimmnd[2748]: WA MDS Send Failed to service:IMMD rc:2 Aug 7 20:30:14 SCALE_SLOT-94 osafamfnd[2767]: WA AMF director unexpectedly crashed -> On the both payloads, CKPTND restarted with the following error in syslog. Aug 7 20:30:17 SCALE_SLOT-94 osafckptnd[2787]: ER cpnd clm node get failed with return value:5 Aug 7 20:30:17 SCALE_SLOT-94 osafamfnd[2767]: NO 'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'componentRestart' Aug 7 20:30:17 SCALE_SLOT-94 osafckptnd[14434]: Started -> But CKPTND Instantation failed and finally the node went for reboot. Aug 7 20:30:27 SCALE_SLOT-94 osafimmnd[2748]: NO Re-introduce-me highestProcessed:6859 highestReceived:6859 Aug 7 20:30:27 SCALE_SLOT-94 osafimmnd[2748]: WA MDS Send Failed to service:IMMD rc:2 Aug 7 20:30:27 SCALE_SLOT-94 osafamfnd[2767]: NO Instantiation of 'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' failed Aug 7 20:30:27 SCALE_SLOT-94 osafamfnd[2767]: NO Reason: component registration timer expired Aug 7 20:30:27 SCALE_SLOT-94 osafckptnd[14451]: Started ... Aug 7 20:30:38 SCALE_SLOT-94 osafamfnd[2767]: NO Instantiation of 'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' failed Aug 7 20:30:38 SCALE_SLOT-94 osafamfnd[2767]: NO Reason: component registration timer expired Aug 7 20:30:38 SCALE_SLOT-94 osafimmnd[2748]: NO Re-introduce-me highestProcessed:6859 highestReceived:6859 Aug 7 20:30:38 SCALE_SLOT-94 osafimmnd[2748]: WA MDS Send Failed to service:IMMD rc:2 Aug 7 20:30:38 SCALE_SLOT-94 osafamfnd[2767]: WA 'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' Presence State RESTARTING => INSTANTIATION_FAILED Aug 7 20:30:38 SCALE_SLOT-94 osafamfnd[2767]: NO avnd_di_oper_send() deferred as AMF director is offline Aug 7 20:30:38 SCALE_SLOT-94 osafamfnd[2767]: WA Director is down. Remove all SIs from 'safSu=PL-4,safSg=NoRed,safApp=OpenSAF' Aug 7 20:30:38 SCALE_SLOT-94 osafamfnd[2767]: NO Component Failover trigerred for 'safSu=PL-4,safSg=NoRed,safApp=OpenSAF': Failed component: 'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' Aug 7 20:30:38 SCALE_SLOT-94 osafamfnd[2767]: ER 'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF'got Inst failed Aug 7 20:30:38 SCALE_SLOT-94 osafamfnd[2767]: Rebooting OpenSAF NodeId = 132111 EE Name = , Reason: NCS component Instantiation failed, OwnNodeId = 132111, SupervisionTime = 60 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1849 CKPT : Performance degradation upto 200%
--- ** [tickets:#1849] CKPT : Performance degradation upto 200%** **Status:** unassigned **Milestone:** 5.0.1 **Created:** Wed May 25, 2016 07:29 AM UTC by Srikanth R **Last Updated:** Wed May 25, 2016 07:29 AM UTC **Owner:** nobody Changeset : 7640 Setup : SUSE 11 on Physical machines. There is considerable degradation in CKPT performance in 5.0 when compared to 4.7. The times are calculated just before api and after api for which time difference is calculated. -> For write operations, checkpoint write api is taking more than 3 times the earlier value in 4.7. Issue is observed in both synchronous and asynchronous mode. ( synchronous -- Checkpoint create flags used : SA_CKPT_WR_ALL_REPLICAS asynchronous -- Checkpoint create flag used : SA_CKPT_WR_ACTIVE_REPLICA | SA_CKPT_CHECKPOINT_COLLOCATED ) -> For section create operations in synchronous mode, checkpoint section create api is taking more than 33% the earlier value in 4.7 -> For read operations in synchronous mode, checkpoint read api is taking more than 15% the earlier value in 4.7 Please check the tickets pushed as part of 4.7 to 5.0, for which API performance got affected. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Mobile security can be enabling, not merely restricting. Employees who bring their own devices (BYOD) to work are irked by the imposition of MDM restrictions. Mobile Device Manager Plus allows you to control only the apps on BYO-devices by containerizing them, leaving personal data untouched! https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1842 rde: standby amfd notifies to NID early.
If opensafd on standby is successfully started, then it means the standby node is ready to take the active role. Performed failover, after standby joined the cluster successfully. But the standby node could not take the active role and entire *CLUSTER RESET* has happened, as the cluster is not having active role. On the active controller :: May 25 11:18:03 CONTROLLER-1 osafimmnd[2281]: NO SERVER STATE: IMM_SERVER_SYNC_SERVER --> IMM_SERVER_READY May 25 11:18:03 CONTROLLER-1 osafamfd[2342]: NO Received node_up from 2020f: msg_id 1 May 25 11:18:04 CONTROLLER-1 osafamfd[2342]: NO Node 'SC-2' joined the cluster 9May 25 11:18:04 CONTROLLER-1 osafimmnd[2281]: NO Implementer connected: 19 (MsgQueueService131599) <0, 2020f> May 25 11:18:04 CONTROLLER-1 osafrded[2249]: NO Peer up on node 0x2020f May 25 11:18:04 CONTROLLER-1 osafrded[2249]: NO Got peer info request from node 0x2020f with role STANDBY May 25 11:18:04 CONTROLLER-1 osafrded[2249]: NO Got peer info response from node 0x2020f with role STANDBY May 25 11:18:04 CONTROLLER-1 osafimmnd[2281]: NO Implementer (applier) connected: 20 (@safAmfService2020f) <0, 2020f> May 25 11:18:05 CONTROLLER-1 osafimmnd[2281]: NO Implementer (applier) connected: 21 (@OpenSafImmReplicatorB) <0, 2020f> May 25 11:18:05 CONTROLLER-1 osafamfnd[2353]: NO 'safComp=CPD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' On the standby controller :: May 25 11:18:04 CONTROLLER-2 osafrded[4212]: NO Got peer info response from node 0x2010f with role ACTIVE May 25 11:18:04 CONTROLLER-2 osafimmd[4231]: IN AMF HA STANDBY request May 25 11:18:04 CONTROLLER-2 osafimmd[4231]: IN node with dest ADDED 564114611150864 May 25 11:18:04 CONTROLLER-2 osafamfnd[4292]: NO Assigned 'safSi=SC-2N,safApp=OpenSAF' STANDBY to 'safSu=SC-2,safSg=2N,safApp=OpenSAF' May 25 11:18:04 CONTROLLER-2 osafimmd[4231]: IN node with dest ADDED 565214191280144 May 25 11:18:04 CONTROLLER-2 opensafd: OpenSAF(5.0.0 - ) services successfully started May 25 11:18:04 CONTROLLER-2 osafimmd[4231]: IN node with dest ADDED 567412731609092 May 25 11:18:04 CONTROLLER-2 osafimmd[4231]: IN node with dest ADDED 566316589850628 done CONTROLLER-2:~ # May 25 11:18:04 CONTROLLER-2 osafimmnd[4242]: NO Implementer (applier) connected: 20 (@safAmfService2020f) <139, 2020f> May 25 11:18:04 CONTROLLER-2 osafimmnd[4242]: NO Implementer (applier) connected: 21 (@OpenSafImmReplicatorB) <147, 2020f> May 25 11:18:04 CONTROLLER-2 osafntfimcnd[4446]: NO Started May 25 11:18:05 CONTROLLER-2 osafimmnd[4242]: NO Implementer disconnected 12 <0, 2010f> (safCheckPointService) May 25 11:18:10 CONTROLLER-2 osaffmd[4221]: NO Node Down event for node id 2010f: May 25 11:18:10 CONTROLLER-2 osaffmd[4221]: NO Current role: STANDBY May 25 11:18:10 CONTROLLER-2 osaffmd[4221]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: Received Node Down for peer controller, OwnNodeId = 131599, SupervisionTime = 60 May 25 11:18:10 CONTROLLER-2 kernel: [ 2246.200249] TIPC: Resetting link <1.1.2:eth3-1.1.1:eth0>, peer not responding May 25 11:18:10 CONTROLLER-2 kernel: [ 2246.200263] TIPC: Lost link <1.1.2:eth3-1.1.1:eth0> on network plane A May 25 11:18:10 CONTROLLER-2 kernel: [ 2246.200272] TIPC: Lost contact with <1.1.1> May 25 11:18:10 CONTROLLER-2 osafrded[4212]: NO Peer down on node 0x2010f May 25 11:18:10 CONTROLLER-2 osafimmd[4231]: WA IMMD lost contact with peer IMMD (NCSMDS_RED_DOWN) May 25 11:18:10 CONTROLLER-2 osafimmd[4231]: IN Resend of fevs message 52769, will not mbcp to peer IMMD May 25 11:18:10 CONTROLLER-2 osafimmnd[4242]: WA DISCARD DUPLICATE FEVS message:52769 May 25 11:18:10 CONTROLLER-2 osafimmnd[4242]: WA Error code 2 returned for message type 82 - ignoring May 25 11:18:10 CONTROLLER-2 osafimmd[4231]: IN Resend of fevs message 52770, will not mbcp to peer IMMD May 25 11:18:10 CONTROLLER-2 osafimmnd[4242]: WA DISCARD DUPLICATE FEVS message:52770 May 25 11:18:10 CONTROLLER-2 osafimmnd[4242]: WA Error code 2 returned for message type 82 - ignoring May 25 11:18:10 CONTROLLER-2 osafimmd[4231]: WA IMMND DOWN on active controller 1 detected at standby immd!! 2. Possible failover May 25 11:18:10 CONTROLLER-2 osafimmd[4231]: NO Skipping re-send of fevs message 52769 since it has recently been resent. May 25 11:18:10 CONTROLLER-2 osafimmd[4231]: NO Skipping re-send of fevs message 52770 since it has recently been resent. May 25 11:18:10 CONTROLLER-2 osafimmnd[4242]: NO Global discard node received for nodeId:2010f pid:2281 May 25 11:18:10 CONTROLLER-2 osafimmnd[4242]: NO Implementer disconnected 13 <0, 2010f(down)> (OpenSafImmPBE) May 25 11:18:10 CONTROLLER-2 osafimmnd[4242]: NO Implementer disconnected 10 <0, 2010f(down)> (safSmfService) May 25 11:18:10 CONTROLLER-2 os
[tickets] [opensaf:tickets] #1845 IMM: Standby syncing is delayed untill payloads join the cluster
--- ** [tickets:#1845] IMM: Standby syncing is delayed untill payloads join the cluster** **Status:** unassigned **Milestone:** 5.0.1 **Created:** Mon May 23, 2016 10:46 AM UTC by Srikanth R **Last Updated:** Mon May 23, 2016 10:46 AM UTC **Owner:** nobody **Attachments:** - [1845.tgz](https://sourceforge.net/p/opensaf/tickets/1845/attachment/1845.tgz) (10.6 MB; application/x-compressed-tar) Changeset : 7640 Setup : 4 nodes cluster with PBE of 50k objects.SUSE 11.2 VM Issue : Standby syncing is delayed with active controller, untill payloads join the cluster. Steps : 1. PBE is enabled on the setup and 50k DB is created earlier. 2. Started opensaf on all the nodes simultaneously. May 23 15:32:49 CONTROLLER-1 osafrded[27818]: NO Requesting ACTIVE role May 23 15:32:49 CONTROLLER-1 osafrded[27818]: NO RDE role set to Undefined May 23 15:32:49 CONTROLLER-1 kernel: [10654.360773] TIPC: Established link <1.1.1:eth0-1.1.2:eth3> on network plane A May 23 15:32:49 CONTROLLER-1 kernel: [10654.881929] TIPC: Established link <1.1.1:eth0-1.1.3:eth3> on network plane A May 23 15:32:50 CONTROLLER-1 kernel: [10655.434543] TIPC: Established link <1.1.1:eth0-1.1.4:eth3> on network plane A May 23 15:32:51 CONTROLLER-1 osafimmd[27837]: NO Attached Nodes:3 Accepted nodes:0 KnownVeteran:0 doReply:1 3. Opensafd on SC-1 started at May 23 15:33:41. 4. Opensafd on PL-3 and PL-4 started at 15:33:42. 5. On SC-2, imm syncing started at 15:33:47 after the active controller and payloads joined. May 23 15:33:47 CONTROLLER-1 osafimmd[27837]: NO Successfully announced sync. New ruling epoch:9 May 23 15:33:47 CONTROLLER-1 osafimmloadd: NO Sync starting 6. At 15:33:57, SC-2 joined the cluster successfully. Because of this, time taken for standby to join during simultaneous startup of all nodes have increased by sync time. Value for IMMSV_NUM_NODES is default (5) and not changed. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Mobile security can be enabling, not merely restricting. Employees who bring their own devices (BYOD) to work are irked by the imposition of MDM restrictions. Mobile Device Manager Plus allows you to control only the apps on BYO-devices by containerizing them, leaving personal data untouched! https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1836 CKPT: Section is not deleted during switchover, after expiration time
- **status**: unassigned --> invalid - **Comment**: This issue is not observed for sleep of 1.2 seconds --- ** [tickets:#1836] CKPT: Section is not deleted during switchover, after expiration time** **Status:** invalid **Milestone:** 4.7.2 **Created:** Wed May 18, 2016 01:20 AM UTC by Srikanth R **Last Updated:** Wed May 18, 2016 02:36 AM UTC **Owner:** nobody **Attachments:** - [1836.tgz](https://sourceforge.net/p/opensaf/tickets/1836/attachment/1836.tgz) (224.7 kB; application/x-compressed-tar) Some times, section is not deleted in a checkpoint after expiry time, during switchover. Below are the steps performed. 1. Created a checkpoint with ALL_REPLICAS. 2. Opened the checkpoint for writing. 3. Section is created. 4. Expiration time is set to 1 second. 5. Invoked middleware switchover 6. After 1.1 second, accessed the checkpoint section by deleting the section. The expected return value is ERR_NOT_EXIST, but the section deletion succeded with SA_AIS_OK. With out switchovers, this issue is observed once in 15 times on an idle setup. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Mobile security can be enabling, not merely restricting. Employees who bring their own devices (BYOD) to work are irked by the imposition of MDM restrictions. Mobile Device Manager Plus allows you to control only the apps on BYO-devices by containerizing them, leaving personal data untouched! https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1836 CKPT: Section is not deleted during switchover, after expiration time
This issue is observed on SLES VMs. We shall modify the application and try with sleep of 1.2 seconds or more and check whether the issue is observed. --- ** [tickets:#1836] CKPT: Section is not deleted during switchover, after expiration time** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Wed May 18, 2016 01:20 AM UTC by Srikanth R **Last Updated:** Wed May 18, 2016 01:20 AM UTC **Owner:** nobody **Attachments:** - [1836.tgz](https://sourceforge.net/p/opensaf/tickets/1836/attachment/1836.tgz) (224.7 kB; application/x-compressed-tar) Some times, section is not deleted in a checkpoint after expiry time, during switchover. Below are the steps performed. 1. Created a checkpoint with ALL_REPLICAS. 2. Opened the checkpoint for writing. 3. Section is created. 4. Expiration time is set to 1 second. 5. Invoked middleware switchover 6. After 1.1 second, accessed the checkpoint section by deleting the section. The expected return value is ERR_NOT_EXIST, but the section deletion succeded with SA_AIS_OK. With out switchovers, this issue is observed once in 15 times on an idle setup. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Mobile security can be enabling, not merely restricting. Employees who bring their own devices (BYOD) to work are irked by the imposition of MDM restrictions. Mobile Device Manager Plus allows you to control only the apps on BYO-devices by containerizing them, leaving personal data untouched! https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1836 CKPT: Section is not deleted during switchover, after expiration time
--- ** [tickets:#1836] CKPT: Section is not deleted during switchover, after expiration time** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Wed May 18, 2016 01:20 AM UTC by Srikanth R **Last Updated:** Wed May 18, 2016 01:20 AM UTC **Owner:** nobody **Attachments:** - [1836.tgz](https://sourceforge.net/p/opensaf/tickets/1836/attachment/1836.tgz) (224.7 kB; application/x-compressed-tar) Some times, section is not deleted in a checkpoint after expiry time, during switchover. Below are the steps performed. 1. Created a checkpoint with ALL_REPLICAS. 2. Opened the checkpoint for writing. 3. Section is created. 4. Expiration time is set to 1 second. 5. Invoked middleware switchover 6. After 1.1 second, accessed the checkpoint section by deleting the section. The expected return value is ERR_NOT_EXIST, but the section deletion succeded with SA_AIS_OK. With out switchovers, this issue is observed once in 15 times on an idle setup. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Mobile security can be enabling, not merely restricting. Employees who bring their own devices (BYOD) to work are irked by the imposition of MDM restrictions. Mobile Device Manager Plus allows you to control only the apps on BYO-devices by containerizing them, leaving personal data untouched! https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1801 lck: saLckResourceOpen with flag SA_LCK_RESOURCE_CREATE returning SA_AIS_ERR_TIMEOUT after 5 failovers.
Similarly, saLckResourceOpen returns SA_AIS_ERR_LIBRARY after switchovers / failovers. This issue is randomly observed. --- ** [tickets:#1801] lck: saLckResourceOpen with flag SA_LCK_RESOURCE_CREATE returning SA_AIS_ERR_TIMEOUT after 5 failovers.** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Mon May 02, 2016 09:52 AM UTC by Madhurika Koppula **Last Updated:** Wed May 04, 2016 06:53 PM UTC **Owner:** nobody **Attachments:** - [glsv.tgz](https://sourceforge.net/p/opensaf/tickets/1801/attachment/glsv.tgz) (3.0 MB; application/octet-stream) Setup: Changeset- 7436 OS: Oracle Linux Server release 6.4 (x86_64) 4 nodes configured with single PBE some failover tests are being ran. safLock=resource1_101 object is not getting deleted. Thereby saLckResourceOpen with flag SA_LCK_RESOURCE_CREATE is continuously returning SA_AIS_ERR_TIMEOUT. With sleep of 10secs, 15times retry is done on the same API call. Snippet from the run: 100|7| SUCCESS : saLckInitialize with valid parameters 100|7| Return Value: SA_AIS_OK 100|7| LckHandle : 6599312 100|7| 100|7| 100|7| SUCCESS : saLckInitialize with valid parameters 100|7| Return Value: SA_AIS_OK 100|7| LckHandle : 6599392 100|7| 100|7| 100|7| Resource Name : safLock=resource1_101 100|7| open flags : SA_LCK_RESOURCE_CREATE 100|7| FAILED : saLckResourceOpen with valid parameters 100|7| Return Value: SA_AIS_ERR_TIMEOUT 100|7| Resource Name : safLock=resource1_101 100|7| open flags : SA_LCK_RESOURCE_CREATE 100|7| 100|7| Resource Name : safLock=resource1_101 100|7| open flags : SA_LCK_RESOURCE_CREATE 100|7| Resource Name : safLock=resource1_101 100|7| open flags : SA_LCK_RESOURCE_CREATE 100|7| 100|7| Resource Name : safLock=resource1_101 100|7| open flags : SA_LCK_RESOURCE_CREATE 100|7| 100|7| Resource Name : safLock=resource1_101 100|7| open flags : SA_LCK_RESOURCE_CREATE 100|7| 100|7| Resource Name : safLock=resource1_101 100|7| open flags : SA_LCK_RESOURCE_CREATE Timeout count exceeded: 15 Timestamp of the Active controller at this instant: May 2 14:22:56 OEL_M-SLOT-2 root: killing osafimmd from run_failover.sh May 2 14:22:56 OEL_M-SLOT-2 osafamfnd[1755]: NO 'safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' May 2 14:22:56 OEL_M-SLOT-2 osafamfnd[1755]: ER safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast May 2 14:22:56 OEL_M-SLOT-2 osafamfnd[1755]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131599, SupervisionTime = 60 May 2 14:22:56 OEL_M-SLOT-2 opensaf_reboot: Rebooting local node; timeout=60 Timestamp of the Standby controller which is becoming active after failover: May 2 14:23:00 OEL_M-SLOT-1 opensaf_reboot: Rebooting remote node in the absence of PLM is outside the scope of OpenSAF May 2 14:23:00 OEL_M-SLOT-1 osaffmd[1677]: NO Controller Failover: Setting role to ACTIVE May 2 14:23:00 OEL_M-SLOT-1 osafrded[1667]: NO RDE role set to ACTIVE May 2 14:23:00 OEL_M-SLOT-1 osafrded[1667]: NO Running '/usr/lib64/opensaf/opensaf_sc_active' with 0 argument(s) May 2 14:23:00 OEL_M-SLOT-1 osafimmd[1688]: NO ACTIVE request May 2 14:23:00 OEL_M-SLOT-1 osaflogd[1711]: NO ACTIVE request May 2 14:23:00 OEL_M-SLOT-1 osafntfd[1722]: NO ACTIVE request May 2 14:23:00 OEL_M-SLOT-1 osafclmd[1733]: NO ACTIVE request May 2 14:23:00 OEL_M-SLOT-1 osafamfd[1744]: NO FAILOVER StandBy --> Active /var/log/messages and osaflckd traces of both controllers are attached. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Mobile security can be enabling, not merely restricting. Employees who bring their own devices (BYOD) to work are irked by the imposition of MDM restrictions. Mobile Device Manager Plus allows you to control only the apps on BYO-devices by containerizing them, leaving personal data untouched! https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1826 AMF: In Nway, pref Active assignments violated for SI after SU lock op
--- ** [tickets:#1826] AMF: In Nway, pref Active assignments violated for SI after SU lock op** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Mon May 16, 2016 02:21 AM UTC by Srikanth R **Last Updated:** Mon May 16, 2016 02:21 AM UTC **Owner:** nobody **Attachments:** - [1821.tgz](https://sourceforge.net/p/opensaf/tickets/1826/attachment/1821.tgz) (562.9 kB; application/x-compressed-tar) Setup : Changeset : 7613 AMF application : NWay, 4Sus and 4Sis with maxActiveSisPerSu =2 and maxStandbySisPerSu=2 Issue : After a lock operation of SU, saAmfSIPrefActiveAssignments is violated for SI. SI got two active assignments. Steps performed : 1. Initially brought up the Nway application, for which all SIs are assigned. | TestApp_SI1 | TestApp_SI2 | TestApp_SI3 | TestApp_SI4 TestApp_SU1|STANDBY || |ACTIVE TestApp_SU2||| ACTIVE |STANDBY TestApp_SU3||ACTIVE|STANDBY | TestApp_SU4|ACTIVE|STANDBY | | 2. Now performed lock operation of SU, for which SI3 and Si4 got two active assignments. | TestApp_SI1 | TestApp_SI2 | TestApp_SI3 | TestApp_SI4 TestApp_SU1|ACTIVE|STANDBY |STANDBY |ACTIVE TestApp_SU2|STANDBY ||ACTIVE |ACTIVE TestApp_SU3||ACTIVE |ACTIVE |STANDBY TestApp_SU4|| | | As the prefered active assignments is set to 1, SI3 cannot be assigned more than 1 active assignment. immlist safSi=TestApp_SI3,safApp=TestApp_Nway saAmfSIPrefStandbyAssignments SA_UINT32_T 1 (0x1) saAmfSIPrefActiveAssignments SA_UINT32_T 1 (0x1) saAmfSINumCurrStandbyAssignments SA_UINT32_T 1 (0x1) saAmfSINumCurrActiveAssignmentsSA_UINT32_T 2 (0x2) saAmfSIAssignmentState SA_UINT32_T 3 (0x3) --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Mobile security can be enabling, not merely restricting. Employees who bring their own devices (BYOD) to work are irked by the imposition of MDM restrictions. Mobile Device Manager Plus allows you to control only the apps on BYO-devices by containerizing them, leaving personal data untouched! https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1757 Standby controller failed to join the cluster
- **summary**: Standby controller failed to join the cluster probably because of setup issues --> Standby controller failed to join the cluster - **status**: unassigned --> duplicate - **Milestone**: 4.7.2 --> never - **Comment**: Closing this ticket as duplicate of #1701. In both the tickets, the CLM client (.i.e.. osafamfnd ) received SA_AIS_ERR_NO_MEMORY during initialization. --- ** [tickets:#1757] Standby controller failed to join the cluster** **Status:** duplicate **Milestone:** never **Created:** Wed Apr 13, 2016 11:12 AM UTC by Ritu Raj **Last Updated:** Wed May 04, 2016 06:56 PM UTC **Owner:** nobody *Setup: Changeset- 7436 Version - opensaf 5.0FC OS: SUSE 11SP2 x86_64 *Issue observed : Standby controller failed to join the cluster with error message "ER Failed to Initialize with CLM" *Steps To Reproduce: > OpenSAF is already up and running on controller1(SC-1) > when OpenSAF started on controller2(SC-2), it failed with following mesage: SCALE_SLOT-2:~ # /etc/init.d/opensafd start Apr 26 20:11:28 SCALE_SLOT-2 opensafd: Starting OpenSAF Services(5.0.FC - ) (Using TIPC) Starting OpenSAF Services (Using TIPC):Apr 26 20:11:28 SCALE_SLOT-2 kernel: [1930938.251473] TIPC: Activated (version 2.0.0) ... Apr 26 20:11:29 SCALE_SLOT-2 osafamfnd[29911]: Started **Apr 26 20:11:29 SCALE_SLOT-2 osafamfnd[29911]: ER Failed to Initialize with CLM: 8 Apr 26 20:11:29 SCALE_SLOT-2 osafamfnd[29911]: ER avnd_create failed** Apr 26 20:11:29 SCALE_SLOT-2 osafamfnd[29911]: NO exiting > The crossponding syslog of active controller(SC-1) at that time Apr 26 20:08:51 SCALE_SLOT-1 osafclmd[31692]: WA FAILED:** ncs_patricia_tree_add, client_id** 53 Apr 26 20:08:51 SCALE_SLOT-1 osafamfd[31702]: NO Node 'SC-2' left the cluster >> It is also observed that, on active controller(SC-1) there in no log record >> of osafclmd during which controller2(SC-2) failed, while other service have >> log record at that time stamp Below is the output of osafclmd (SC-1), during time stamp "Apr 26 20:08:51.237701" to "Apr 26 20:12:06.272871" osafclmd not logged anything. Apr 26 20:08:51.237695 osafclmd [31692:clms_evt.c:1601] << process_api_evt **Apr 26 20:08:51.237701 osafclmd [31692:clms_evt.c:1667] << clms_process_mbx Apr 26 20:12:06.272871 osafclmd [31692:ava_mds.c:0179] >> ava_mds_cbk** Apr 26 20:12:06.272923 osafclmd [31692:ava_mds.c:0530] >> ava_mds_flat_dec Note: 1. This is random issue 2. The time gap between controller1(SC-1) and controller2(SC-2) is 3 min. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1800 AMF : Proxied should be brought down initially during NG lock-in admin op
--- ** [tickets:#1800] AMF : Proxied should be brought down initially during NG lock-in admin op** **Status:** unassigned **Milestone:** 5.0.RC2 **Created:** Sat Apr 30, 2016 07:50 AM UTC by Srikanth R **Last Updated:** Sat Apr 30, 2016 07:50 AM UTC **Owner:** nobody Changeset : 7436 Setup : 2n Red model with proxy and proxied SU hosted on same node. During lock-in operation of node group, initially proxied SU should be brought down ( .i.e, component termination callback should be sent for proxied ) and later proxy SU should be brought down. But in the current implementation, proxy SU is brought down initially and later proxied SU is tried to be brought down , which got failed as there is no proxy. 436 05:30:00 01/01/1970 NO safApp=safAmfService "Admin op "LOCK_INSTANTIATION" initiated for 'safAmfNodeGroup=SCs,safAmfCluster=myAmfCluster', invocation: 300647710721" 437 05:30:00 01/01/1970 NO safApp=safAmfService "safAmfNodeGroup=SCs,safAmfCluster=myAmfCluster AdmState LOCKED => LOCKED_INSTANTIATION" 438 05:30:00 01/01/1970 NO safApp=safAmfService "safAmfNode=SC-1,safAmfCluster=myAmfCluster AdmState LOCKED => LOCKED_INSTANTIATION" 439 05:30:00 01/01/1970 NO safApp=safAmfService "safAmfNode=SC-2,safAmfCluster=myAmfCluster AdmState LOCKED => LOCKED_INSTANTIATION" 440 05:30:00 01/01/1970 NO safApp=safAmfService "safComp=proxied,safSu=PROXIED_SU1_2N,safSg=PROXIED_SG_2N,safApp=PROXIED_2N ProxyStatus is now UNPROXIED" 441 05:30:00 01/01/1970 NO safApp=safAmfService "safSu=PROXY_SU1_2N,safSg=PROXY_SG_2N,safApp=PROXY_2N PresenceState TERMINATING => UNINSTANTIATED" 442 05:30:00 01/01/1970 NO safApp=safAmfService "safSu=PROXIED_SU1_2N,safSg=PROXIED_SG_2N,safApp=PROXIED_2N PresenceState TERMINATING => UNINSTANTIATED" 443 05:30:00 01/01/1970 NO safApp=safAmfService "safSu=PROXIED_SU1_2N,safSg=PROXIED_SG_2N,safApp=PROXIED_2N OperState ENABLED => DISABLED" 444 05:30:00 01/01/1970 NO safApp=safAmfService "Autorepair not done for 'safSu=PROXIED_SU1_2N,safSg=PROXIED_SG_2N,safApp=PROXIED_2N'" 445 05:30:00 01/01/1970 NO safApp=safAmfService "Admin op done for invocation: 300647710721, result 1" --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1799 AMF : csiName and csiFlags are not properly populated, during assignment removal ( proxy)
- **summary**: AMF : csiName and csiFlags are not properly populated, during assignment removal --> AMF : csiName and csiFlags are not properly populated, during assignment removal ( proxy) - Description has changed: Diff: --- old +++ new @@ -3,6 +3,6 @@ * Initially the proxy and proxied are in fully assigned state. -* Now perform lock operation on proxy SU, for which quiesced callback and csi removal callback is populating the csiFlags as SA_AMF_CSI_TARGET_ALL and csiName is populated as NULL. But the proxy component is having active assignments , both for proxy and proxied. Similar is for lock operation is on proxied SU. +* Now perform lock operation on proxy SU, for which quiesced callback and csi removal callback is populating the csiFlags as SA_AMF_CSI_TARGET_ALL and csiName is populated as NULL. But the proxy component is having active assignments , which is to be removed according to the callback . Similar is for lock operation is on proxied SU. So expectation is that for lock operation on either proxy / proxied SU ,csiFlags should be populated as SA_AMF_CSI_TARGET_ONE with the corresponding csi. --- ** [tickets:#1799] AMF : csiName and csiFlags are not properly populated, during assignment removal ( proxy)** **Status:** unassigned **Milestone:** 5.0.RC2 **Created:** Sat Apr 30, 2016 06:17 AM UTC by Srikanth R **Last Updated:** Sat Apr 30, 2016 06:17 AM UTC **Owner:** nobody Changeset : 7436 Setup :2N redmodel with both proxy and proxied hosted on the same node. * Initially the proxy and proxied are in fully assigned state. * Now perform lock operation on proxy SU, for which quiesced callback and csi removal callback is populating the csiFlags as SA_AMF_CSI_TARGET_ALL and csiName is populated as NULL. But the proxy component is having active assignments , which is to be removed according to the callback . Similar is for lock operation is on proxied SU. So expectation is that for lock operation on either proxy / proxied SU ,csiFlags should be populated as SA_AMF_CSI_TARGET_ONE with the corresponding csi. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1799 AMF : csiName and csiFlags are not properly populated, during assignment removal
--- ** [tickets:#1799] AMF : csiName and csiFlags are not properly populated, during assignment removal** **Status:** unassigned **Milestone:** 5.0.RC2 **Created:** Sat Apr 30, 2016 06:17 AM UTC by Srikanth R **Last Updated:** Sat Apr 30, 2016 06:17 AM UTC **Owner:** nobody Changeset : 7436 Setup :2N redmodel with both proxy and proxied hosted on the same node. * Initially the proxy and proxied are in fully assigned state. * Now perform lock operation on proxy SU, for which quiesced callback and csi removal callback is populating the csiFlags as SA_AMF_CSI_TARGET_ALL and csiName is populated as NULL. But the proxy component is having active assignments , both for proxy and proxied. Similar is for lock operation is on proxied SU. So expectation is that for lock operation on either proxy / proxied SU ,csiFlags should be populated as SA_AMF_CSI_TARGET_ONE with the corresponding csi. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1792 osaf: update opensaf status command to reflect spares
It would be easy to comprehend , if rdegetrole command on spare controllers give the output as "SPARE" instead of QUIESCED. --- ** [tickets:#1792] osaf: update opensaf status command to reflect spares** **Status:** unassigned **Milestone:** 5.0.GA **Created:** Thu Apr 28, 2016 08:03 PM UTC by Mathi Naickan **Last Updated:** Thu Apr 28, 2016 08:03 PM UTC **Owner:** nobody --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1780 Imm: Imm service is down FOREVER on the nodes after IMMND restart , due to system issues
- **summary**: Imm: Immfind failed with TRY_AGAIN after immnd is killed on payload PL-3 and on active controller --> Imm: Imm service is down FOREVER on the nodes after IMMND restart , due to system issues - **Comment**: Changing the ticket heading for better understanding --- ** [tickets:#1780] Imm: Imm service is down FOREVER on the nodes after IMMND restart , due to system issues** **Status:** invalid **Milestone:** 5.0.RC2 **Created:** Mon Apr 25, 2016 11:46 AM UTC by Madhurika Koppula **Last Updated:** Tue Apr 26, 2016 06:54 AM UTC **Owner:** Neelakanta Reddy **Attachments:** - [imm.tgz](https://sourceforge.net/p/opensaf/tickets/1780/attachment/imm.tgz) (13.2 kB; application/octet-stream) Setup: Changeset- 7436 OS: Oracle Linux Server release 6.4 (x86_64) Version - opensaf 5.0 4 nodes configured with single PBE Immfind is failed with TRY_AGAINS after immnd is killed on PL-3 and on active controller. Imm admin operations are still failing forever on PL-3 and SC-1 (Active) even though immnd got restarted properly on PL-3 and SC-1. (Initialize itself is failing ). Steps To reproduce: 1) Kill Immnd on Active and PL-3 2)Perform any imm admin operations. Here is the snippet. [root@OEL_M-SLOT-3 log]# immfind error - saImmOmInitialize FAILED: SA_AIS_ERR_TRY_AGAIN (6) [root@OEL_M-SLOT-3 log]# 1st killed IMMND on ACTIVE controller at below timestamp: Apr 25 11:48:52 OEL_M-SLOT-1 osafntfimcnd[9124]: NO saImmOiDispatch() Fail SA_AIS_ERR_BAD_HANDLE (9) Apr 25 11:48:52 OEL_M-SLOT-1 osafimmd[1716]: WA IMMND coordinator at 2010f apparently crashed => electing new coord Apr 25 11:48:52 OEL_M-SLOT-1 osafimmd[1716]: NO New coord elected, resides at 2020f Apr 25 11:48:53 OEL_M-SLOT-1 osafamfnd[1796]: NO Restarting a component of 'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1) Apr 25 11:48:53 OEL_M-SLOT-1 osafamfnd[1796]: NO 'safComp=IMMND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'componentRestart' Apr 25 11:48:53 OEL_M-SLOT-1 osafimmnd[10126]: Started Apr 25 11:48:53 OEL_M-SLOT-1 osafimmnd[10126]: NO Persistent Back-End capability configured, Pbe file:imm.db (suffix may get added) Apr 25 11:48:53 OEL_M-SLOT-1 osafimmnd[10126]: NO IMMD service is UP ... ScAbsenseAllowed?:0 introduced?:0 Apr 25 11:48:53 OEL_M-SLOT-1 osafimmd[1716]: NO New IMMND process is on ACTIVE Controller at 2010f 2nd killed IMMND on ACTIVE controller at below timestamp: Apr 25 14:44:52 OEL_M-SLOT-1 osafamfnd[1796]: NO 'safComp=IMMND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'componentRestart' Apr 25 14:44:53 OEL_M-SLOT-1 osafimmnd[15848]: Started Apr 25 14:44:53 OEL_M-SLOT-1 osafimmnd[15848]: NO Persistent Back-End capability configured, Pbe file:imm.db (suffix may get added) Apr 25 14:44:53 OEL_M-SLOT-1 osafimmnd[15848]: NO IMMD service is UP ... ScAbsenseAllowed?:0 introduced?:0 Apr 25 14:44:53 OEL_M-SLOT-1 osafimmd[1716]: NO New IMMND process is on ACTIVE Controller at 2010f Apr 25 14:44:53 OEL_M-SLOT-1 osafimmd[1716]: NO Extended intro from node 2010f Apr 25 14:44:53 OEL_M-SLOT-1 osafimmnd[15848]: NO SERVER STATE: IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING Apr 25 14:44:53 OEL_M-SLOT-1 osafimmnd[15848]: NO SETTING COORD TO 0 CLOUD PROTO Apr 25 14:44:53 OEL_M-SLOT-1 osafimmnd[15848]: NO SERVER STATE: IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING Apr 25 14:44:53 OEL_M-SLOT-1 osafimmnd[15848]: NO SERVER STATE: IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING Apr 25 14:44:53 OEL_M-SLOT-1 osafimmd[1716]: WA IMMND on controller (not currently coord) requests sync Apr 25 14:44:53 OEL_M-SLOT-1 osafimmnd[15848]: NO NODE STATE-> IMM_NODE_ISOLATED Killed IMMND on PL-3 at below time stamp: Apr 25 12:11:26 OEL_M-SLOT-3 osafamfnd[2415]: NO 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' component restart probation timer started (timeout: 600 ns) Apr 25 12:11:26 OEL_M-SLOT-3 osafamfnd[2415]: NO Restarting a component of 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1) Apr 25 12:11:26 OEL_M-SLOT-3 osafamfnd[2415]: NO 'safComp=IMMND,safSu=PL-3,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'componentRestart' Attaching the log snippets of Immnd active and immnd PL-3 and /var/log/messages. This issue might be related to the ticket #1735, because node state of immnd of PL-3 is also observed as IMM_NODE_ISOLATED. But immfind did not suceed for ever on SC-1 Active even though immnd restarted successfully on SC-1 at below timestamp Apr 25 11:48:53 OEL_M-SLOT-1 osafimmnd[10126]: NO SERVER STATE: IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING Apr 25 11:48:53 OEL_M-SLOT-1 osafimmnd[10126]: NO NODE STATE-> IMM_NODE_ISOLATED Apr 25 11:48:53 OEL_M-SLOT-1 osafimmd[1716]: NO Node 2010f request sync sync-pid:10126 epoch:0 Apr 25 11:48:54 OEL_M-SLOT-1 osafimmnd[10126]: NO NODE STATE-> IMM_NODE_W_AVAILABLE Apr 25 11:48:54 OEL_M-SLOT-1 osafimmd[1716]: NO Successfully a
[tickets] [opensaf:tickets] #1546 AMF : Lock of node should be allowed similar to ng, if more than one SU is hosted
- **Type**: enhancement --> defect - **Comment**: Changing the ticket type to defect, as the SU struck in terminating state for the following steps. * Deploy two SUs on a single payload, for which active and standby assignments are done (2N). * Perform Lock operation on the payload's CLM object. The lock operation fails with the ERR_REPAIR_PENDING return code. * Now perform lock operation on the application SG, followed by lock-in operation for which SU gets struck in terminating state. --- ** [tickets:#1546] AMF : Lock of node should be allowed similar to ng, if more than one SU is hosted** **Status:** unassigned **Milestone:** future **Created:** Thu Oct 15, 2015 03:38 AM UTC by Srikanth R **Last Updated:** Wed Jan 13, 2016 06:29 AM UTC **Owner:** nobody Changeset : 6901 currently for 2N, lock of node group is allowed, if more than 1 SU is hosted on the member node of node group. But lock of node is not allowed, if more than 1 SU is hosted. # amf-adm lock safAmfNode=SC-2,safAmfCluster=myAmfCluster error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: SA_AIS_ERR_NOT_SUPPORTED (19) error-string: Node lock/shutdown not allowed with two SUs on same node #amf-adm lock safAmfNodeGroup=SCs,safAmfCluster=myAmfCluster safAmfNodeGroup=SCs,safAmfCluster=myAmfCluster UNLOCKED --> LOCKED safAmfNode=SC-1,safAmfCluster=myAmfCluster UNLOCKED --> LOCKED safAmfNode=SC-2,safAmfCluster=myAmfCluster UNLOCKED --> LOCKED safSi=TWONSI5,safApp=TWONAPP FULLYASSIGNED --> PARTIALLYASSIGNED safSi=TWONSI5,safApp=TWONAPP Alarm MAJOR --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1795 AMF : haState should be marked QUIESCING in PG callback for shutdown op
--- ** [tickets:#1795] AMF : haState should be marked QUIESCING in PG callback for shutdown op** **Status:** unassigned **Milestone:** 5.0.RC2 **Created:** Fri Apr 29, 2016 07:24 AM UTC by Srikanth R **Last Updated:** Fri Apr 29, 2016 07:24 AM UTC **Owner:** nobody Changeset : 7434 For the shutdown operation on the SI, the haState is filled up with the value SA_AMF_HA_QUIESCED (3), instead of SA_AMF_HA_QUIESCING (4) in the protection group callback. PROTECTION GROUP CALLBACK IS INVOKED error : 1 numberOfMembers : 2 csiName : safCsi=CSI1,safSi=TestApp_SI1,safApp=TestApp_TwoN number of items in notification buffer is 2 {0: {'member': {'haState': 2, 'compName': safComp=COMP1,safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN, 'rank': 1, 'haReadinessState': 1}, 'change': 1}, 1: {'member': {'haState': **3**, 'compName': safComp=COMP1,safSu=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_TwoN, 'rank': 2, 'haReadinessState': 1}, 'change': 4}} --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1794 AMF : amfd crashed on both controllers, after opensafd is stopped on appl hosted payloads
--- ** [tickets:#1794] AMF : amfd crashed on both controllers, after opensafd is stopped on appl hosted payloads ** **Status:** unassigned **Milestone:** 5.0.RC2 **Created:** Fri Apr 29, 2016 06:48 AM UTC by Srikanth R **Last Updated:** Fri Apr 29, 2016 06:48 AM UTC **Owner:** nobody **Attachments:** - [1794.tgz](https://sourceforge.net/p/opensaf/tickets/1794/attachment/1794.tgz) (3.8 MB; application/x-compressed-tar) Changeset : 7436 5.0.FC Setup : 5 nodes cluster with 3 payloads. Application : 2n red model , 3 SUs with 4 SIs ( si-si dep configured ) PL-3 is hosting SU1 and SU3 and PL-4 is hosting SU2. Issue : AMFD on both controllers crashed , after opensafd is stopped on application hosted payloads. Steps performed : -> After deploying application, lot of AMF related operations have been performed. -> After that, following is the opensafd status , where SU1 deployed on PL-3 is standby and SU2 deployed on PL-4 is active. safSISU=safSu=TestApp_SU1\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI3,safApp=TestApp_TwoN saAmfSISUHAState=STANDBY(2) safSISU=safSu=TestApp_SU1\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI4,safApp=TestApp_TwoN saAmfSISUHAState=STANDBY(2) safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed6,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) safSISU=safSu=TestApp_SU1\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI2,safApp=TestApp_TwoN saAmfSISUHAState=STANDBY(2) safSISU=safSu=TestApp_SU3\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI2,safApp=TestApp_TwoN saAmfSISUHAState=ACTIVE(1) safSISU=safSu=TestApp_SU3\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI1,safApp=TestApp_TwoN saAmfSISUHAState=ACTIVE(1) safSISU=safSu=TestApp_SU3\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI3,safApp=TestApp_TwoN saAmfSISUHAState=ACTIVE(1) safSISU=safSu=TestApp_SU1\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI1,safApp=TestApp_TwoN saAmfSISUHAState=STANDBY(2) safSISU=safSu=PL-5\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed5,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) safSISU=safSu=TestApp_SU3\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI4,safApp=TestApp_TwoN saAmfSISUHAState=ACTIVE(1) safSISU=safSu=SC-2\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) safSISU=safSu=SC-2\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) safSISU=safSu=PL-4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF saAmfSISUHAState=STANDBY(2) safSISU=safSu=TestApp_SU2\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI1,safApp=TestApp_TwoN saAmfSISUHAState=ACTIVE(1) safSISU=safSu=TestApp_SU2\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI2,safApp=TestApp_TwoN saAmfSISUHAState=ACTIVE(1) safSISU=safSu=TestApp_SU2\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI3,safApp=TestApp_TwoN saAmfSISUHAState=ACTIVE(1) safSISU=safSu=TestApp_SU2\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI4,safApp=TestApp_TwoN saAmfSISUHAState=ACTIVE(1) -> Now stopped opensafd on the payloads PL-5 and PL-4, one after another. -> Amfd on the active controller crashed after opensafd is stopped on PL-4. Apr 28 16:47:54 CONTROLLER-2 osafamfd[12188]: NO Node 'PL-4' left the cluster Apr 28 16:47:54 CONTROLLER-2 osafamfd[12188]: sg_2n_fsm.cc:534: avd_sg_2n_act_susi: Assertion 'a_susi_1->su == a_susi_2->su' failed. Apr 28 16:47:54 CONTROLLER-2 osafamfnd[12198]: WA AMF director unexpectedly crashed Note, this issue is not reproducible just by bringing up the application and performing the above steps. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1784 Amfd asserts on clm locked controller after successfully taking active role as a part of failover
Amfd asserts for invalid root cause entity here. CLM populating the invalid root cause entity as part of the callback is reported in the ticket #1342 --- ** [tickets:#1784] Amfd asserts on clm locked controller after successfully taking active role as a part of failover** **Status:** unassigned **Milestone:** 5.0.RC2 **Created:** Tue Apr 26, 2016 11:47 AM UTC by Ritu Raj **Last Updated:** Tue Apr 26, 2016 11:47 AM UTC **Owner:** nobody **Attachments:** - [messages](https://sourceforge.net/p/opensaf/tickets/1784/attachment/messages) (3.2 MB; application/octet-stream) - [osafamfd](https://sourceforge.net/p/opensaf/tickets/1784/attachment/osafamfd) (7.4 MB; application/octet-stream) setup: Changeset- 7436 Version - opensaf 5.0 FC * Issue Observed : Amfd asserts on clm locked controller after successfully taking active role as a part of failover. * Steps To Reproduce: 1. OpenSAF running on 4 nodes, where SC-1 is Active , SC-2 Standby and PL-3 and PL-4 are payloads. 2. Performed CLM lock of stanby controller (SC-2), 3. Now, perform failover on active controller(SC-1) 4. Observed that amfd asserted on clm locked controller(SC-2) and cluster reset happened >SLOT-2:~ # Apr 26 14:56:06 SLOT-2 osafimmd[2199]: WA IMMD lost contact with >peer IMMD (NCSMDS_RED_DOWN) ... Apr 26 14:56:11 SLOT-2 osaffmd[2189]: NO Node Down event for node id 2010f: Apr 26 14:56:11 SLOT-2 osaffmd[2189]: NO Current role: STANDBY ... Apr 26 14:56:11 SLOT-2 osafrded[2180]: NO Peer down on node 0x2010f Apr 26 14:56:11 SLOT-2 osafimmd[2199]: WA IMMND DOWN on active controller 1 detected at standby immd!! 2. Possible failover ... Apr 26 14:56:11 SLOT-2 opensaf_reboot: Rebooting remote node in the absence of PLM is outside the scope of OpenSAF Apr 26 14:56:11 SLOT-2 osaffmd[2189]: NO Controller Failover: Setting role to ACTIVE Apr 26 14:56:11 SLOT-2 osafrded[2180]: NO RDE role set to ACTIVE Apr 26 14:56:11 SLOT-2 osafrded[2180]: NO Running '/usr/lib64/opensaf/opensaf_sc_active' with 0 argument(s) Apr 26 14:56:11 SLOT-2 osafimmd[2199]: NO ACTIVE request Apr 26 14:56:11 SLOT-2 osaflogd[2224]: NO ACTIVE request Apr 26 14:56:11 SLOT-2 osafntfd[2234]: NO ACTIVE request Apr 26 14:56:11 SLOT-2 osafclmd[2244]: NO ACTIVE request Apr 26 14:56:11 SLOT-2 osafamfd[2254]: NO FAILOVER StandBy --> Active Apr 26 14:56:11 SLOT-2 osafamfnd[2264]: NO AVD NEW_ACTIVE, adest:1 Apr 26 14:56:11 SLOT-2 osafimmd[2199]: NO ellect_coord invoke from rda_callback ACTIVE Apr 26 14:56:11 SLOT-2 osafimmd[2199]: NO New coord elected, resides at 2020f Apr 26 14:56:11 SLOT-2 osafimmnd[2210]: NO 2PBE configured, IMMSV_PBE_FILE_SUFFIX:.2020f (sync) Apr 26 14:56:11 SLOT-2 osafimmnd[2210]: NO This IMMND is now the NEW Coord Apr 26 14:56:11 SLOT-2 osafimmnd[2210]: NO SETTING COORD TO 1 CLOUD PROTO Apr 26 14:56:11 SLOT-2 osafimmnd[2210]: NO Implementer disconnected 16 <139, 2020f> (@safAmfService2020f) Apr 26 14:56:11 SLOT-2 osafimmnd[2210]: NO Implementer connected: 18 (safLogService) <126, 2020f> Apr 26 14:56:11 SLOT-2 osafimmnd[2210]: NO Implementer connected: 19 (safAmfService) <139, 2020f> Apr 26 14:56:11 SLOT-2 osafamfd[2254]: NO Node 'SC-1' left the cluster Apr 26 14:56:11 SLOT-2 osafamfd[2254]: NO FAILOVER StandBy --> Active DONE! Apr 26 14:56:11 SLOT-2 osafamfnd[2264]: NO Assigning 'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF' Apr 26 14:56:11 SLOT-2 osafntfimcnd[2419]: NO exiting on signal 15 .. Apr 26 14:56:11 SLOT-2 osafimmnd[2210]: NO Implementer connected: 27 (safSmfService) <337, 2020f> Apr 26 14:56:11 SLOT-2 osafamfnd[2264]: NO Assigned 'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF' Apr 26 14:56:11 SLOT-2 osafamfd[2254]: ER Wrong rootCauseEntity �H� Apr 26 14:56:11 SLOT-2 osafamfd[2254]: clm.cc:312: clm_track_cb: Assertion '0' failed. Apr 26 14:56:11 SLOT-2 osafamfnd[2264]: WA AMF director unexpectedly crashed Apr 26 14:56:11 SLOT-2 osafamfnd[2264]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, OwnNodeId = 131599, SupervisionTime = 60 Apr 26 14:56:11 SLOT-2 opensaf_reboot: Rebooting local node; timeout=60 * Syslog and amfd trace attached Note: The issue is observed randomly --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/cl
[tickets] [opensaf:tickets] #1779 AMF: si-swap locked SI returns "error - command timed out"
There is already existing ticket for this issue : https://sourceforge.net/p/opensaf/tickets/1294/ --- ** [tickets:#1779] AMF: si-swap locked SI returns "error - command timed out"** **Status:** unassigned **Milestone:** 4.6.1 **Created:** Mon Apr 25, 2016 09:19 AM UTC by Quyen Dao **Last Updated:** Mon Apr 25, 2016 10:16 AM UTC **Owner:** nobody cs: 7537:06ac24c4b9c3 **steps to reproduce** immcfg -f AppConfig-2N.xml amf-adm unlock-in safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1 amf-adm unlock-in safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 amf-adm unlock safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1 amf-adm unlock safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 amf-adm lock safSi=AmfDemo,safApp=AmfDemo1 amf-adm si-swap safSi=AmfDemo,safApp=AmfDemo1 **Result** root@SC-1:/srv/shared# immcfg -f AppConfig-2N.xml root@SC-1:/srv/shared# amf-adm unlock-in safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1 root@SC-1:/srv/shared# amf-adm unlock-in safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 root@SC-1:/srv/shared# amf-adm unlock safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1 root@SC-1:/srv/shared# amf-adm unlock safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 root@SC-1:/srv/shared# amf-adm lock safSi=AmfDemo,safApp=AmfDemo1 root@SC-1:/srv/shared# amf-adm si-swap safSi=AmfDemo,safApp=AmfDemo1 error - command timed out (alarm) root@SC-1:/srv/shared# --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1762 CLM : Healthy payloads are marked as Non-member nodes after failover
Issue is not observed when the same steps are performed on the opensaf of the changeset 7325 .i.e., before #1701 ticket. --- ** [tickets:#1762] CLM : Healthy payloads are marked as Non-member nodes after failover** **Status:** unassigned **Milestone:** 5.0.RC2 **Created:** Thu Apr 14, 2016 11:08 AM UTC by Srikanth R **Last Updated:** Fri Apr 22, 2016 10:17 AM UTC **Owner:** nobody Setup : Changeset : 7436 5.0.FC 5 nodes cluster with Application deployed on PL-3 and PL-4. Issue : Healthy payloads are marked as Non-member nodes after failover Steps performed : * Started opensaf on all the nodes .i.e SC-1 to PL-5 * Initially brought up AMF application deployed on PL-3 and PL-4 * Ran some tests on the setup including switchovers, failovers and CLM lock operations on PL-3 and PL-4. * Restarted opensafd on PL-4. After the restart, AMF applications on PL-3 got the corresponding standby assignment as per expectation. Below is the trace from osafclmd Apr 14 14:15:45.621396 osafclmd [6745:clms_ntf.c:0180] TR Notification for CLM node safNode=PL-4,safCluster=myClmCluster exit Apr 14 14:15:56.548867 osafclmd [6745:clms_ntf.c:0142] TR Notification for CLM node safNode=PL-4,safCluster=myClmCluster Join * Similarly restarted opensafd on PL-3 and the AMF application came up fine. Apr 14 14:16:00.890903 osafclmd [6745:clms_ntf.c:0180] TR Notification for CLM node safNode=PL-3,safCluster=myClmCluster exit Apr 14 14:21:41.602270 osafclmd [6745:clms_ntf.c:0142] TR Notification for CLM node safNode=PL-3,safCluster=myClmCluster Join * Now induced a failover by killing ckptd on the active controller SC-1. * SC-2 took active role. Apr 14 14:21:44 CONTROLLER-2 osafamfd[22600]: NO FAILOVER StandBy --> Active * But the two payloads PL-3 and PL-4 are marked as out of cluster by AMF. PL-5 is still part of the cluster Apr 14 14:21:45 CONTROLLER-2 osafamfd[22600]: NO Node 'PL-4' left the cluster Apr 14 14:21:45 CONTROLLER-2 osafamfd[22600]: NO Node 'PL-3' left the cluster Apr 14 14:21:45 CONTROLLER-2 osafamfd[22600]: WA avd_msg_sanity_chk: invalid node ID (2030f) * Below is the trace from CLMD about PL-3 & PL-4 exit, just after the active promotion. Apr 14 14:21:45.009100 osafclmd [22590:clms_ntf.c:0180] TR Notification for CLM node safNode=PL-4,safCluster=myClmCluster exit Apr 14 14:21:45.136368 osafclmd [22590:clms_ntf.c:0180] TR Notification for CLM node safNode=PL-3,safCluster=myClmCluster exit * The AMF applications on PL-3 and PL-4 did not receive any csi removal callback during failover, but AMF nodes are marked as disabled & attribute saClmNodeIsMember of the CLM objects PL_3 and PL-4 is set to 0. Opensafd status doesn't show PL-3 and PL-4, * The CLM apis on PL-3 and PL-4 failed with ERR_UNAVAILABLE, but not for other services like CKPT, MQSV. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1777 CLM : Admin state attrib of payload CLM object toggles after every switchover
--- ** [tickets:#1777] CLM : Admin state attrib of payload CLM object toggles after every switchover** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Fri Apr 22, 2016 11:38 AM UTC by Srikanth R **Last Updated:** Fri Apr 22, 2016 11:38 AM UTC **Owner:** nobody Changeset: 7438 Opensaf version : 5.0.FC Setup : 5 nodes with 3 payloads. Issue : CLM Admin state attribute of payload toggles after every switchover Steps performed : * All the nodes are brought up initially. * Continuous Switchovers are triggered. * When SC-1 just became active , CLM lock operation is issued on PL-4 when SC-1 is active. * CLM lock operation succeded with following error in syslog. Apr 22 13:09:01 SYSTEST-CNTLR-1 osafrded[3947]: NO Got peer info request from node 0x2020f with role STANDBY Apr 22 13:09:01 SYSTEST-CNTLR-1 osafrded[3947]: NO Got peer info response from node 0x2020f with role STANDBY ... Apr 22 13:09:24 SYSTEST-CNTLR-1 osafclmd[4028]: NO safNode=PL-4,safCluster=myClmCluster LEFT, view number=17 Apr 22 13:09:24 SYSTEST-CNTLR-1 osafclmd[4028]: ER Sending track callback failed Apr 22 13:09:24 SYSTEST-CNTLR-1 osafclmd[4028]: ER Sending track callback failed Apr 22 13:09:24 SYSTEST-CNTLR-1 osafclmd[4028]: ER Sending track callback failed Apr 22 13:09:24 SYSTEST-CNTLR-1 osafclmd[4028]: ER Sending track callback failed Apr 22 13:09:24 SYSTEST-CNTLR-1 osafclmd[4028]: ER clms_clmresp_ok failed * Now, if a switchover is invoked and when SC-2 became active, PL-4's admin state is updated as UNLOCKED with out any admin operation on PL-4 CLM object * At this stage, for every switchover the admin state of payload's CLM object toggles between LOCKED state ( when SC-1 is active ) to UNLOCKED state ( when SC-2 is active). --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1762 CLM : Healthy payloads are marked as Non-member nodes after failover
Below are the definitive steps to reproduce the issue. * Bringup four nodes in the cluster with SC-1 as active controller * Issue two switchovers. * Perform lock operation on one of payload say PL-4 * Perform unlock operation on locked payload * Restart opensafd on the payload * Perform failover by killing any director process on the active controller (SC-1) * The standby takes the active role and as part of transition updates the unlocked PL-4 as out of cluster. Change set version : 7436 ( 5.0.FC) --- ** [tickets:#1762] CLM : Healthy payloads are marked as Non-member nodes after failover** **Status:** unassigned **Milestone:** 5.0.RC2 **Created:** Thu Apr 14, 2016 11:08 AM UTC by Srikanth R **Last Updated:** Thu Apr 14, 2016 11:53 AM UTC **Owner:** nobody Setup : Changeset : 7436 5.0.FC 5 nodes cluster with Application deployed on PL-3 and PL-4. Issue : Healthy payloads are marked as Non-member nodes after failover Steps performed : * Started opensaf on all the nodes .i.e SC-1 to PL-5 * Initially brought up AMF application deployed on PL-3 and PL-4 * Ran some tests on the setup including switchovers, failovers and CLM lock operations on PL-3 and PL-4. * Restarted opensafd on PL-4. After the restart, AMF applications on PL-3 got the corresponding standby assignment as per expectation. Below is the trace from osafclmd Apr 14 14:15:45.621396 osafclmd [6745:clms_ntf.c:0180] TR Notification for CLM node safNode=PL-4,safCluster=myClmCluster exit Apr 14 14:15:56.548867 osafclmd [6745:clms_ntf.c:0142] TR Notification for CLM node safNode=PL-4,safCluster=myClmCluster Join * Similarly restarted opensafd on PL-3 and the AMF application came up fine. Apr 14 14:16:00.890903 osafclmd [6745:clms_ntf.c:0180] TR Notification for CLM node safNode=PL-3,safCluster=myClmCluster exit Apr 14 14:21:41.602270 osafclmd [6745:clms_ntf.c:0142] TR Notification for CLM node safNode=PL-3,safCluster=myClmCluster Join * Now induced a failover by killing ckptd on the active controller SC-1. * SC-2 took active role. Apr 14 14:21:44 CONTROLLER-2 osafamfd[22600]: NO FAILOVER StandBy --> Active * But the two payloads PL-3 and PL-4 are marked as out of cluster by AMF. PL-5 is still part of the cluster Apr 14 14:21:45 CONTROLLER-2 osafamfd[22600]: NO Node 'PL-4' left the cluster Apr 14 14:21:45 CONTROLLER-2 osafamfd[22600]: NO Node 'PL-3' left the cluster Apr 14 14:21:45 CONTROLLER-2 osafamfd[22600]: WA avd_msg_sanity_chk: invalid node ID (2030f) * Below is the trace from CLMD about PL-3 & PL-4 exit, just after the active promotion. Apr 14 14:21:45.009100 osafclmd [22590:clms_ntf.c:0180] TR Notification for CLM node safNode=PL-4,safCluster=myClmCluster exit Apr 14 14:21:45.136368 osafclmd [22590:clms_ntf.c:0180] TR Notification for CLM node safNode=PL-3,safCluster=myClmCluster exit * The AMF applications on PL-3 and PL-4 did not receive any csi removal callback during failover, but AMF nodes are marked as disabled & attribute saClmNodeIsMember of the CLM objects PL_3 and PL-4 is set to 0. Opensafd status doesn't show PL-3 and PL-4, * The CLM apis on PL-3 and PL-4 failed with ERR_UNAVAILABLE, but not for other services like CKPT, MQSV. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1765 saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after couple of failover
Irrespective of the callbacks order, this issue should not be observed. --- ** [tickets:#1765] saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after couple of failover** **Status:** accepted **Milestone:** 4.6.2 **Created:** Fri Apr 15, 2016 06:26 AM UTC by Ritu Raj **Last Updated:** Wed Apr 20, 2016 06:03 AM UTC **Owner:** Pham Hoang Nhat **Attachments:** - [ckpt_trace.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1765/attachment/ckpt_trace.tar.bz2) (3.2 MB; application/x-bzip) setup: Changeset- 7436 Version - opensaf 5.0 FC 4 nodes configured with single PBE and a load of 30K objects * Issue observed : saCkptCheckpointOpen api call failed and returing SA_AIS_ERR_LIBRARY after couple of failover * Steps to reproduce: > Ran couple of failover and observed saCkptCheckpointOpen failed. > below is the snippet of agent trace: Apr 15 8:08:50.275115 cpa [28883:cpa_mds.c:0776] << cpa_mds_msg_sync_send: retval = 1 Apr 15 8:08:50.275128 cpa [28883:cpa_api.c:1043] T4 Cpa CkptOpen failed with return value:2,ckptHandle:63 Apr 15 8:08:50.275141 cpa [28883:cpa_api.c:1146] << **saCkptCheckpointOpen: API return code = 2** > Traces of both controllers and agent trace of payload is attached. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1770 AMF : amfnd segfaulted during su failover escalation
Traces of amfnd and syslog on the node hosting SU. Also Application configuration is attached. Attachments: - [1770.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/fa42a0c9/d65e/attachment/1770.tgz) (393.8 kB; application/x-compressed-tar) --- ** [tickets:#1770] AMF : amfnd segfaulted during su failover escalation** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Tue Apr 19, 2016 06:53 AM UTC by Srikanth R **Last Updated:** Tue Apr 19, 2016 06:53 AM UTC **Owner:** nobody Setup : 5 node cluster with 3 payloads changeset : 7438 ( opensaf 5.0.FC) Application : 2N with 5 SUs ( si-si deps enabled & su failover flag enabled) Issue : AMFND hosting the faulty SU segfaulted during su Failover escalation as part of SU lock operation Steps performed : -> Initially bring up the application and ensure that application is fully assigned. -> Perform one fault operation on the SU hosting the active assignment, such a way that the next fault is escalated to su failover. -> Perform lock operation of SU hosting the active assignment. -> Do not respond to the CSI removal callback, for which this fault shall be escalated to su failover. -> AMFND seg faulted with the following bt file signal: 11 pid: 320 uid: 0 /usr/lib64/libopensaf_core.so.0(+0x1fd9d)[0x7f1d79294d9d] /lib64/libpthread.so.0(+0xf7c0)[0x7f1d782b67c0] /usr/lib64/opensaf/osafamfnd[0x43b1ff] /usr/lib64/opensaf/osafamfnd[0x417f89] /usr/lib64/opensaf/osafamfnd[0x408469] /usr/lib64/opensaf/osafamfnd[0x42c65a] /usr/lib64/opensaf/osafamfnd[0x42c4a0] /usr/lib64/opensaf/osafamfnd[0x42b979] /lib64/libc.so.6(_ _libc_start_main+0xe6)[0x7f1d77ac1c36] /usr/lib64/opensaf/osafamfnd[0x405f29] -> Below is the entry in osafamfnd trace : Apr 19 11:23:44.684918 osafamfnd [29522:clc.cc:0870] T1 'safComp=COMP2SU5TWONAPP,safSu=SU5,safSg=SGONE,safApp=TWONAPP':FSM Enter presence state: 'SA_AMF_PRESENCE_TERMINATING(4)':FSM Exit presence state:SA_AMF_PRESENCE_TERMINATING(4) Apr 19 11:23:44.684924 osafamfnd [29522:clc.cc:0889] << avnd_comp_clc_fsm_run: 1 Apr 19 11:23:44.684930 osafamfnd [29522:err.cc:1120] << avnd_err_su_repair: retval=1 Apr 19 11:23:44.684936 osafamfnd [29522:susm.cc:0255] >> avnd_su_siq_prc: SU 'safSu=SU5,safSg=SGONE,safApp=TWONAPP' Apr 19 11:23:44.684942 osafamfnd [29522:susm.cc:0260] << avnd_su_siq_prc Apr 19 11:23:44.684947 osafamfnd [29522:susm.cc:1176] << avnd_su_si_oper_done: 1 Apr 19 11:23:44.684953 osafamfnd [29522:comp.cc:1822] << avnd_comp_csi_remove_done: 1 Apr 19 11:23:44.684959 osafamfnd [29522:comp.cc:1321] << avnd_comp_csi_remove: 1 Apr 19 11:23:44.685055 osafamfnd [29522:comp.cc:1678] >> all_csis_in_removed_state: 'safSu=SU5,safSg=SGONE,safApp=TWONAPP' Apr 19 11:23:44.685064 osafamfnd [29522:comp.cc:1691] << all_csis_in_removed_state: 1 Apr 19 11:23:44.685070 osafamfnd [29522:susm.cc:1021] >> avnd_su_si_oper_done: 'safSu=SU5,safSg=SGONE,safApp=TWONAPP' '(null)' Apr 19 11:23:44.685076 osafamfnd [29522:susm.cc:0845] >> susi_operation_in_progress: 'safSu=SU5,safSg=SGONE,safApp=TWONAPP' '(null)' Apr 19 11:23:44.685082 osafamfnd [29522:susm.cc:0890] << susi_operation_in_progress: 1 Apr 19 11:23:44.685096 osafamfnd [29522:err.cc:1586] >> is_no_assignment_due_to_escalations Apr 19 11:23:44.685102 osafamfnd [29522:err.cc:1591] << is_no_assignment_due_to_escalations: true Apr 19 11:24:51.153931 osafamfnd [2500:ncs_main_pub.c:0223] TR NCS:PROCESS_ID=2500 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1770 AMF : amfnd segfaulted during su failover escalation
--- ** [tickets:#1770] AMF : amfnd segfaulted during su failover escalation** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Tue Apr 19, 2016 06:53 AM UTC by Srikanth R **Last Updated:** Tue Apr 19, 2016 06:53 AM UTC **Owner:** nobody Setup : 5 node cluster with 3 payloads changeset : 7438 ( opensaf 5.0.FC) Application : 2N with 5 SUs ( si-si deps enabled & su failover flag enabled) Issue : AMFND hosting the faulty SU segfaulted during su Failover escalation as part of SU lock operation Steps performed : -> Initially bring up the application and ensure that application is fully assigned. -> Perform one fault operation on the SU hosting the active assignment, such a way that the next fault is escalated to su failover. -> Perform lock operation of SU hosting the active assignment. -> Do not respond to the CSI removal callback, for which this fault shall be escalated to su failover. -> AMFND seg faulted with the following bt file signal: 11 pid: 320 uid: 0 /usr/lib64/libopensaf_core.so.0(+0x1fd9d)[0x7f1d79294d9d] /lib64/libpthread.so.0(+0xf7c0)[0x7f1d782b67c0] /usr/lib64/opensaf/osafamfnd[0x43b1ff] /usr/lib64/opensaf/osafamfnd[0x417f89] /usr/lib64/opensaf/osafamfnd[0x408469] /usr/lib64/opensaf/osafamfnd[0x42c65a] /usr/lib64/opensaf/osafamfnd[0x42c4a0] /usr/lib64/opensaf/osafamfnd[0x42b979] /lib64/libc.so.6(_ _libc_start_main+0xe6)[0x7f1d77ac1c36] /usr/lib64/opensaf/osafamfnd[0x405f29] -> Below is the entry in osafamfnd trace : Apr 19 11:23:44.684918 osafamfnd [29522:clc.cc:0870] T1 'safComp=COMP2SU5TWONAPP,safSu=SU5,safSg=SGONE,safApp=TWONAPP':FSM Enter presence state: 'SA_AMF_PRESENCE_TERMINATING(4)':FSM Exit presence state:SA_AMF_PRESENCE_TERMINATING(4) Apr 19 11:23:44.684924 osafamfnd [29522:clc.cc:0889] << avnd_comp_clc_fsm_run: 1 Apr 19 11:23:44.684930 osafamfnd [29522:err.cc:1120] << avnd_err_su_repair: retval=1 Apr 19 11:23:44.684936 osafamfnd [29522:susm.cc:0255] >> avnd_su_siq_prc: SU 'safSu=SU5,safSg=SGONE,safApp=TWONAPP' Apr 19 11:23:44.684942 osafamfnd [29522:susm.cc:0260] << avnd_su_siq_prc Apr 19 11:23:44.684947 osafamfnd [29522:susm.cc:1176] << avnd_su_si_oper_done: 1 Apr 19 11:23:44.684953 osafamfnd [29522:comp.cc:1822] << avnd_comp_csi_remove_done: 1 Apr 19 11:23:44.684959 osafamfnd [29522:comp.cc:1321] << avnd_comp_csi_remove: 1 Apr 19 11:23:44.685055 osafamfnd [29522:comp.cc:1678] >> all_csis_in_removed_state: 'safSu=SU5,safSg=SGONE,safApp=TWONAPP' Apr 19 11:23:44.685064 osafamfnd [29522:comp.cc:1691] << all_csis_in_removed_state: 1 Apr 19 11:23:44.685070 osafamfnd [29522:susm.cc:1021] >> avnd_su_si_oper_done: 'safSu=SU5,safSg=SGONE,safApp=TWONAPP' '(null)' Apr 19 11:23:44.685076 osafamfnd [29522:susm.cc:0845] >> susi_operation_in_progress: 'safSu=SU5,safSg=SGONE,safApp=TWONAPP' '(null)' Apr 19 11:23:44.685082 osafamfnd [29522:susm.cc:0890] << susi_operation_in_progress: 1 Apr 19 11:23:44.685096 osafamfnd [29522:err.cc:1586] >> is_no_assignment_due_to_escalations Apr 19 11:23:44.685102 osafamfnd [29522:err.cc:1591] << is_no_assignment_due_to_escalations: true Apr 19 11:24:51.153931 osafamfnd [2500:ncs_main_pub.c:0223] TR NCS:PROCESS_ID=2500 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1757 Standby controller failed to join the cluster probably because of setup issues
Please note that, there is a time difference between both the controllers of close to 94 seconds. Attachments: - [1757.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/bf13/00e0/attachment/1757.tgz) (2.5 MB; application/x-compressed-tar) --- ** [tickets:#1757] Standby controller failed to join the cluster probably because of setup issues** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Wed Apr 13, 2016 11:12 AM UTC by Ritu Raj **Last Updated:** Mon Apr 18, 2016 09:09 AM UTC **Owner:** nobody *Setup: Changeset- 7436 Version - opensaf 5.0FC OS: SUSE 11SP2 x86_64 *Issue observed : Standby controller failed to join the cluster with error message "ER Failed to Initialize with CLM" *Steps To Reproduce: > OpenSAF is already up and running on controller1(SC-1) > when OpenSAF started on controller2(SC-2), it failed with following mesage: SCALE_SLOT-2:~ # /etc/init.d/opensafd start Apr 26 20:11:28 SCALE_SLOT-2 opensafd: Starting OpenSAF Services(5.0.FC - ) (Using TIPC) Starting OpenSAF Services (Using TIPC):Apr 26 20:11:28 SCALE_SLOT-2 kernel: [1930938.251473] TIPC: Activated (version 2.0.0) ... Apr 26 20:11:29 SCALE_SLOT-2 osafamfnd[29911]: Started **Apr 26 20:11:29 SCALE_SLOT-2 osafamfnd[29911]: ER Failed to Initialize with CLM: 8 Apr 26 20:11:29 SCALE_SLOT-2 osafamfnd[29911]: ER avnd_create failed** Apr 26 20:11:29 SCALE_SLOT-2 osafamfnd[29911]: NO exiting > The crossponding syslog of active controller(SC-1) at that time Apr 26 20:08:51 SCALE_SLOT-1 osafclmd[31692]: WA FAILED:** ncs_patricia_tree_add, client_id** 53 Apr 26 20:08:51 SCALE_SLOT-1 osafamfd[31702]: NO Node 'SC-2' left the cluster >> It is also observed that, on active controller(SC-1) there in no log record >> of osafclmd during which controller2(SC-2) failed, while other service have >> log record at that time stamp Below is the output of osafclmd (SC-1), during time stamp "Apr 26 20:08:51.237701" to "Apr 26 20:12:06.272871" osafclmd not logged anything. Apr 26 20:08:51.237695 osafclmd [31692:clms_evt.c:1601] << process_api_evt **Apr 26 20:08:51.237701 osafclmd [31692:clms_evt.c:1667] << clms_process_mbx Apr 26 20:12:06.272871 osafclmd [31692:ava_mds.c:0179] >> ava_mds_cbk** Apr 26 20:12:06.272923 osafclmd [31692:ava_mds.c:0530] >> ava_mds_flat_dec Note: 1. This is random issue 2. The time gap between controller1(SC-1) and controller2(SC-2) is 3 min. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1757 Standby controller failed to join the cluster probably because of setup issues
- **Version**: 5.0 FC --> 5.0.FC - **Priority**: minor --> major - **Comment**: This issue is observed again on a different setup after 4 failovers. Setup : changeset : 7436 ( 5.0.FC) Setup : 5 node cluster with 150K PBE objects On the standby controller Apr 18 12:57:50 SYSTEST-CNTLR-2 osafamfnd[2535]: Started Apr 18 12:57:50 SYSTEST-CNTLR-2 osafamfnd[2535]: ER Failed to Initialize with CLM: 8 Apr 18 12:57:50 SYSTEST-CNTLR-2 osafamfnd[2535]: ER avnd_create failed Apr 18 12:57:50 SYSTEST-CNTLR-2 osafamfnd[2535]: NO exiting On the active controller Apr 18 12:59:24 SYSTEST-CNTLR-1 osafimmnd[2298]: NO SERVER STATE: IMM_SERVER_SYNC_SERVER --> IMM_SERVER_READY Apr 18 12:59:25 SYSTEST-CNTLR-1 osafclmd[2328]: WA FAILED: ncs_patricia_tree_add, client_id 70 Apr 18 12:59:25 SYSTEST-CNTLR-1 osafamfd[2338]: NO Node 'SC-2' left the cluster Please note that, this type of issue is not seen earlier and failovers used to run smoothly on this setup with 4.6 & 4.7 opensaf versions. --- ** [tickets:#1757] Standby controller failed to join the cluster probably because of setup issues** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Wed Apr 13, 2016 11:12 AM UTC by Ritu Raj **Last Updated:** Fri Apr 15, 2016 06:32 AM UTC **Owner:** nobody *Setup: Changeset- 7436 Version - opensaf 5.0FC OS: SUSE 11SP2 x86_64 *Issue observed : Standby controller failed to join the cluster with error message "ER Failed to Initialize with CLM" *Steps To Reproduce: > OpenSAF is already up and running on controller1(SC-1) > when OpenSAF started on controller2(SC-2), it failed with following mesage: SCALE_SLOT-2:~ # /etc/init.d/opensafd start Apr 26 20:11:28 SCALE_SLOT-2 opensafd: Starting OpenSAF Services(5.0.FC - ) (Using TIPC) Starting OpenSAF Services (Using TIPC):Apr 26 20:11:28 SCALE_SLOT-2 kernel: [1930938.251473] TIPC: Activated (version 2.0.0) ... Apr 26 20:11:29 SCALE_SLOT-2 osafamfnd[29911]: Started **Apr 26 20:11:29 SCALE_SLOT-2 osafamfnd[29911]: ER Failed to Initialize with CLM: 8 Apr 26 20:11:29 SCALE_SLOT-2 osafamfnd[29911]: ER avnd_create failed** Apr 26 20:11:29 SCALE_SLOT-2 osafamfnd[29911]: NO exiting > The crossponding syslog of active controller(SC-1) at that time Apr 26 20:08:51 SCALE_SLOT-1 osafclmd[31692]: WA FAILED:** ncs_patricia_tree_add, client_id** 53 Apr 26 20:08:51 SCALE_SLOT-1 osafamfd[31702]: NO Node 'SC-2' left the cluster >> It is also observed that, on active controller(SC-1) there in no log record >> of osafclmd during which controller2(SC-2) failed, while other service have >> log record at that time stamp Below is the output of osafclmd (SC-1), during time stamp "Apr 26 20:08:51.237701" to "Apr 26 20:12:06.272871" osafclmd not logged anything. Apr 26 20:08:51.237695 osafclmd [31692:clms_evt.c:1601] << process_api_evt **Apr 26 20:08:51.237701 osafclmd [31692:clms_evt.c:1667] << clms_process_mbx Apr 26 20:12:06.272871 osafclmd [31692:ava_mds.c:0179] >> ava_mds_cbk** Apr 26 20:12:06.272923 osafclmd [31692:ava_mds.c:0530] >> ava_mds_flat_dec Note: 1. This is random issue 2. The time gap between controller1(SC-1) and controller2(SC-2) is 3 min. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1762 CLM : Healthy payloads are marked as Non-member nodes after failover
Traces of clmd,amfd and immnd on both controllers,with syslog of all nodes are attached Attachments: - [1762.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/e65ee23f/1931/attachment/1762.tgz) (3.9 MB; application/x-compressed-tar) --- ** [tickets:#1762] CLM : Healthy payloads are marked as Non-member nodes after failover** **Status:** unassigned **Milestone:** 5.0.RC2 **Created:** Thu Apr 14, 2016 11:08 AM UTC by Srikanth R **Last Updated:** Thu Apr 14, 2016 11:08 AM UTC **Owner:** nobody Setup : Changeset : 7436 5.0.FC 5 nodes cluster with Application deployed on PL-3 and PL-4. Issue : Healthy payloads are marked as Non-member nodes after failover Steps performed : * Started opensaf on all the nodes .i.e SC-1 to PL-5 * Initially brought up AMF application deployed on PL-3 and PL-4 * Ran some tests on the setup including switchovers, failovers and CLM lock operations on PL-3 and PL-4. * Restarted opensafd on PL-4. After the restart, AMF applications on PL-3 got the corresponding standby assignment as per expectation. Below is the trace from osafclmd Apr 14 14:15:45.621396 osafclmd [6745:clms_ntf.c:0180] TR Notification for CLM node safNode=PL-4,safCluster=myClmCluster exit Apr 14 14:15:56.548867 osafclmd [6745:clms_ntf.c:0142] TR Notification for CLM node safNode=PL-4,safCluster=myClmCluster Join * Similarly restarted opensafd on PL-3 and the AMF application came up fine. Apr 14 14:16:00.890903 osafclmd [6745:clms_ntf.c:0180] TR Notification for CLM node safNode=PL-3,safCluster=myClmCluster exit Apr 14 14:21:41.602270 osafclmd [6745:clms_ntf.c:0142] TR Notification for CLM node safNode=PL-3,safCluster=myClmCluster Join * Now induced a failover by killing ckptd on the active controller SC-1. * SC-2 took active role. Apr 14 14:21:44 CONTROLLER-2 osafamfd[22600]: NO FAILOVER StandBy --> Active * But the two payloads PL-3 and PL-4 are marked as out of cluster by AMF. PL-5 is still part of the cluster Apr 14 14:21:45 CONTROLLER-2 osafamfd[22600]: NO Node 'PL-4' left the cluster Apr 14 14:21:45 CONTROLLER-2 osafamfd[22600]: NO Node 'PL-3' left the cluster Apr 14 14:21:45 CONTROLLER-2 osafamfd[22600]: WA avd_msg_sanity_chk: invalid node ID (2030f) * Below is the trace from CLMD about PL-3 & PL-4 exit, just after the active promotion. Apr 14 14:21:45.009100 osafclmd [22590:clms_ntf.c:0180] TR Notification for CLM node safNode=PL-4,safCluster=myClmCluster exit Apr 14 14:21:45.136368 osafclmd [22590:clms_ntf.c:0180] TR Notification for CLM node safNode=PL-3,safCluster=myClmCluster exit * The AMF applications on PL-3 and PL-4 did not receive any csi removal callback during failover, but AMF nodes are marked as disabled & attribute saClmNodeIsMember of the CLM objects PL_3 and PL-4 is set to 0. Opensafd status doesn't show PL-3 and PL-4, * The CLM apis on PL-3 and PL-4 failed with ERR_UNAVAILABLE, but not for other services like CKPT, MQSV. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1762 CLM : Healthy payloads are marked as Non-member nodes after failover
--- ** [tickets:#1762] CLM : Healthy payloads are marked as Non-member nodes after failover** **Status:** unassigned **Milestone:** 5.0.RC2 **Created:** Thu Apr 14, 2016 11:08 AM UTC by Srikanth R **Last Updated:** Thu Apr 14, 2016 11:08 AM UTC **Owner:** nobody Setup : Changeset : 7436 5.0.FC 5 nodes cluster with Application deployed on PL-3 and PL-4. Issue : Healthy payloads are marked as Non-member nodes after failover Steps performed : * Started opensaf on all the nodes .i.e SC-1 to PL-5 * Initially brought up AMF application deployed on PL-3 and PL-4 * Ran some tests on the setup including switchovers, failovers and CLM lock operations on PL-3 and PL-4. * Restarted opensafd on PL-4. After the restart, AMF applications on PL-3 got the corresponding standby assignment as per expectation. Below is the trace from osafclmd Apr 14 14:15:45.621396 osafclmd [6745:clms_ntf.c:0180] TR Notification for CLM node safNode=PL-4,safCluster=myClmCluster exit Apr 14 14:15:56.548867 osafclmd [6745:clms_ntf.c:0142] TR Notification for CLM node safNode=PL-4,safCluster=myClmCluster Join * Similarly restarted opensafd on PL-3 and the AMF application came up fine. Apr 14 14:16:00.890903 osafclmd [6745:clms_ntf.c:0180] TR Notification for CLM node safNode=PL-3,safCluster=myClmCluster exit Apr 14 14:21:41.602270 osafclmd [6745:clms_ntf.c:0142] TR Notification for CLM node safNode=PL-3,safCluster=myClmCluster Join * Now induced a failover by killing ckptd on the active controller SC-1. * SC-2 took active role. Apr 14 14:21:44 CONTROLLER-2 osafamfd[22600]: NO FAILOVER StandBy --> Active * But the two payloads PL-3 and PL-4 are marked as out of cluster by AMF. PL-5 is still part of the cluster Apr 14 14:21:45 CONTROLLER-2 osafamfd[22600]: NO Node 'PL-4' left the cluster Apr 14 14:21:45 CONTROLLER-2 osafamfd[22600]: NO Node 'PL-3' left the cluster Apr 14 14:21:45 CONTROLLER-2 osafamfd[22600]: WA avd_msg_sanity_chk: invalid node ID (2030f) * Below is the trace from CLMD about PL-3 & PL-4 exit, just after the active promotion. Apr 14 14:21:45.009100 osafclmd [22590:clms_ntf.c:0180] TR Notification for CLM node safNode=PL-4,safCluster=myClmCluster exit Apr 14 14:21:45.136368 osafclmd [22590:clms_ntf.c:0180] TR Notification for CLM node safNode=PL-3,safCluster=myClmCluster exit * The AMF applications on PL-3 and PL-4 did not receive any csi removal callback during failover, but AMF nodes are marked as disabled & attribute saClmNodeIsMember of the CLM objects PL_3 and PL-4 is set to 0. Opensafd status doesn't show PL-3 and PL-4, * The CLM apis on PL-3 and PL-4 failed with ERR_UNAVAILABLE, but not for other services like CKPT, MQSV. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1275 AMF: SG is in unstable state ( standby csi removal timeout during sponsor si lock )
The first scenario is reproduced with the changeset 7236. App config : 2n red, 2 SUs with 4 COMPs , 1 sponsor with 3 dependent SIs, su restart failover flag disabled. Please find the amfd trace and application creation script. Below are the steps followed. * Ensure that all the SIs are assigned * Lock the sponsor SI * During the lock of sponsor SI operation, the component (COMP1 in SU2 ) hosting standby assignment doesn't respond to the CSI removal callback. Attachments: - [1275_issue1.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/ae1313a5/5dea/attachment/1275_issue1.tgz) (41.8 kB; application/x-compressed-tar) --- ** [tickets:#1275] AMF: SG is in unstable state ( standby csi removal timeout during sponsor si lock )** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Thu Mar 19, 2015 01:48 PM UTC by Srikanth R **Last Updated:** Mon Nov 02, 2015 09:23 AM UTC **Owner:** nobody *Setup* Version : 4.6 FC model : 2n configuration : 1App,1SG,2SUs with 4comps each, 4SIs with 1 CSI each si-si deps configured as SI1 is sponsor to SI2,3,&4. SU1 is mapped to pl-3 and SU2 to pl-4 saAmfSGAutoRepair=1(True) SuFailover=0(False) component recovery policy - 3 (comp failover) *Initial state* All the AMF entities regarding the application are in unlocked states. SIs are in fully assigned state. *Issue* SG is in unstable state ( standby csi removal timeout during sponsor si lock ) *Steps Performed* -> Before performing lock operation of sponsor SI, ensured that component 1 in SU2 ( the standby SU) does not respond in CSI removal callback. -> SG went to unstable state, after the lock operation of sponsor SI. Below are the logs on PL-4 ( where standby SU is hosted ) : Mar 19 19:05:11 SYSTEST-PLD-2 osafamfnd[24560]: NO Removed 'safSi=SI1,safApp=test2nApp' from 'safSu=SU2,safSg=SG,safApp=test2nApp' Mar 19 19:05:21 SYSTEST-PLD-2 osafamfnd[24560]: NO Removed 'safSi=SI2,safApp=test2nApp' from 'safSu=SU2,safSg=SG,safApp=test2nApp' Mar 19 19:05:21 SYSTEST-PLD-2 osafamfnd[24560]: CR SU-SI record addition failed, SU= safSu=SU2,safSg=SG,safApp=test2nApp : SI=safSi=SI3,safApp=test2nApp Mar 19 19:05:21 SYSTEST-PLD-2 osafamfnd[24560]: CR SU-SI record addition failed, SU= safSu=SU2,safSg=SG,safApp=test2nApp : SI=safSi=SI4,safApp=test2nApp Below is the final state of SIs after the lock operation. safSi=SI1,safApp=test2nApp saAmfSIAdminState=LOCKED(2) saAmfSIAssignmentState=UNASSIGNED(1) safSi=SI2,safApp=test2nApp saAmfSIAdminState=UNLOCKED(1) saAmfSIAssignmentState=UNASSIGNED(1) safSi=SI3,safApp=test2nApp saAmfSIAdminState=UNLOCKED(1) saAmfSIAssignmentState=PARTIALLY_ASSIGNED(3) safSi=SI4,safApp=test2nApp saAmfSIAdminState=UNLOCKED(1) saAmfSIAssignmentState=PARTIALLY_ASSIGNED(3) --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1756 AMF : amfd on controller asserted ( for CSI removal timeout during application si lock )
--- ** [tickets:#1756] AMF : amfd on controller asserted ( for CSI removal timeout during application si lock )** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Wed Apr 13, 2016 11:00 AM UTC by Srikanth R **Last Updated:** Wed Apr 13, 2016 11:00 AM UTC **Owner:** nobody **Attachments:** - [1755.tgz](https://sourceforge.net/p/opensaf/tickets/1756/attachment/1755.tgz) (681.2 kB; application/x-compressed-tar) Changeset : 7436 Version : 5.0 FC Setup : Controller with 2 payloads. 2n red model with 2 SUs, 4 SIs and no si-si deps. Steps performed : -> Initially the application is brought up and all the SIs are fully assigned. -> LPerformed shutdown operation on one of the SI .i.e SI4. -> Ensured that application with active assignment shall time out in CSI removal callback. The shutdown operation timed out and the amfd on active controller asserted. Invoking admin operation SHUTDOWN on safSi=TestApp_SI4,safApp=TestApp_TwoN OP RETURN VALUE and AIS OP RETURN VAL = 5 -65536 Apr 13 16:17:40 CONTROLLER-2 osafamfd[2689]: sg_2n_fsm.cc:125: avd_su_fsm_state_determine: Assertion '0' failed. Apr 13 16:17:40 CONTROLLER-2 osafamfnd[2699]: WA AMF director unexpectedly crashed Apr 13 16:17:40 CONTROLLER-2 osafamfnd[2699]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, OwnNodeId = 131599, SupervisionTime = 6 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1748 MQSV service is not working ( Queue open does not succeed)
- Description has changed: Diff: --- old +++ new @@ -86,4 +86,4 @@ 100|0| Return Value: SA_AIS_ERR_TIMEOUT -3) +Traces of msgd, msgnd and test application are attached - Attachments has changed: Diff: --- old +++ new @@ -0,0 +1 @@ +mqsv.tgz (109.2 kB; application/x-compressed-tar) --- ** [tickets:#1748] MQSV service is not working ( Queue open does not succeed) ** **Status:** unassigned **Milestone:** 5.0.RC2 **Created:** Tue Apr 12, 2016 03:49 AM UTC by Srikanth R **Last Updated:** Tue Apr 12, 2016 03:49 AM UTC **Owner:** nobody **Attachments:** - [mqsv.tgz](https://sourceforge.net/p/opensaf/tickets/1748/attachment/mqsv.tgz) (109.2 kB; application/x-compressed-tar) Changeset : 7436 Version : 5.0 FC Issue : 1) Queue opening fails with TRY_AGAIN / TIME_OUT. Below is the output of sample application : DEMO SCENARIO#1: Receiving messages via Sync API - saMsgMessageGet START MQSV:MQA:ONsaMsgQueueOpen failed with rc - 6 Press Enter Key to Continue... Below is the output of test application with retry mechanism handled. 100|0| RETRY : saMsgQueueOpen with valid params - Non Persistent 100|0| Return Value: SA_AIS_ERR_TRY_AGAIN 100|0| 100|0| Retry Count : 10 100|0| Retry Count : 20 100|0| Retry Count : 30 100|0| Retry Count : 40 100|0| Try again count exceeded In the case of aysnc, queue open callback returns TRY_AGAIN. 100|0| Queue name : safMq=nonpersistent_Q_37 100|0| size: 1000 100|0| creation flags : SA_MSG_QUEUE_NON_PERSISTENT 100|0| open flags : SA_MSG_QUEUE_CREATE 100|0| SUCCESS : saMsgQueueOpenAsync with valid parameters - Non Persistent 100|0| Return Value: SA_AIS_OK 100|0| Invocation : 115 100|0| 100|0| 100|0| --- Queue Open Callback - 100|0| Error String : SA_AIS_ERR_TRY_AGAIN 100|0| Invocation : 115 100|0| --- Below is the output in syslog . Apr 10 20:31:21 CONTROLLER-2 osafmsgnd[13195]: ER ERR_TRY_AGAIN: Timeout occurs Unable to send the respons in async case Apr 10 20:31:21 CONTROLLER-2 osafmsgnd[13195]: ER The procedure to open the Queue Failed with err 6 Apr 10 20:31:21 CONTROLLER-2 osafmsgd[13213]: ER Sending the message to the specified destination with error 6 Apr 10 20:31:21 CONTROLLER-2 osafmsgd[13213]: ER ERR_FAILED_OPERATION: Couldn't Send ASAPi Name Resolution Response Message 2) saMsgInitialize and saMsgFinalize returning TIME_OUT, which is not earlier observed on an idle system . 100|0| * Create a queue with zero retention time using saMsgQueueOpen * 100|0| 100|0| Version : B.3.1 100|0|MQSV:MQA:ON 100|0| FAILED : saMsgInitialize with all valid parameters 100|0| Return Value: SA_AIS_ERR_TIMEOUT 100|0| 100|0| 100|0| 100|0| Version : B.3.1 100|0|MQSV:MQA:ON 100|0| 100|0| Version : B.3.1 100|0|MQSV:MQA:ON 100|0| 100|0| Version : B.3.1 100|0|MQSV:MQA:ON 100|0| SUCCESS : saMsgInitialize with all valid parameters 100|0| Return Value: SA_AIS_OK 100|0| Version : B.3.1 100|0| SUCCESS : saMsgInitialize with all valid parameters 100|0| Return Value: SA_AIS_OK 100|0| Message Handle : 6876704 100|0| Version Output : B.3.1 100|0| FAILED : saMsgFinalize with all valid parameters 100|0| Return Value: SA_AIS_ERR_TIMEOUT Traces of msgd, msgnd and test application are attached --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1748 MQSV service is not working ( Queue open does not succeed)
--- ** [tickets:#1748] MQSV service is not working ( Queue open does not succeed) ** **Status:** unassigned **Milestone:** 5.0.RC2 **Created:** Tue Apr 12, 2016 03:49 AM UTC by Srikanth R **Last Updated:** Tue Apr 12, 2016 03:49 AM UTC **Owner:** nobody Changeset : 7436 Version : 5.0 FC Issue : 1) Queue opening fails with TRY_AGAIN / TIME_OUT. Below is the output of sample application : DEMO SCENARIO#1: Receiving messages via Sync API - saMsgMessageGet START MQSV:MQA:ONsaMsgQueueOpen failed with rc - 6 Press Enter Key to Continue... Below is the output of test application with retry mechanism handled. 100|0| RETRY : saMsgQueueOpen with valid params - Non Persistent 100|0| Return Value: SA_AIS_ERR_TRY_AGAIN 100|0| 100|0| Retry Count : 10 100|0| Retry Count : 20 100|0| Retry Count : 30 100|0| Retry Count : 40 100|0| Try again count exceeded In the case of aysnc, queue open callback returns TRY_AGAIN. 100|0| Queue name : safMq=nonpersistent_Q_37 100|0| size: 1000 100|0| creation flags : SA_MSG_QUEUE_NON_PERSISTENT 100|0| open flags : SA_MSG_QUEUE_CREATE 100|0| SUCCESS : saMsgQueueOpenAsync with valid parameters - Non Persistent 100|0| Return Value: SA_AIS_OK 100|0| Invocation : 115 100|0| 100|0| 100|0| --- Queue Open Callback - 100|0| Error String : SA_AIS_ERR_TRY_AGAIN 100|0| Invocation : 115 100|0| --- Below is the output in syslog . Apr 10 20:31:21 CONTROLLER-2 osafmsgnd[13195]: ER ERR_TRY_AGAIN: Timeout occurs Unable to send the respons in async case Apr 10 20:31:21 CONTROLLER-2 osafmsgnd[13195]: ER The procedure to open the Queue Failed with err 6 Apr 10 20:31:21 CONTROLLER-2 osafmsgd[13213]: ER Sending the message to the specified destination with error 6 Apr 10 20:31:21 CONTROLLER-2 osafmsgd[13213]: ER ERR_FAILED_OPERATION: Couldn't Send ASAPi Name Resolution Response Message 2) saMsgInitialize and saMsgFinalize returning TIME_OUT, which is not earlier observed on an idle system . 100|0| * Create a queue with zero retention time using saMsgQueueOpen * 100|0| 100|0| Version : B.3.1 100|0|MQSV:MQA:ON 100|0| FAILED : saMsgInitialize with all valid parameters 100|0| Return Value: SA_AIS_ERR_TIMEOUT 100|0| 100|0| 100|0| 100|0| Version : B.3.1 100|0|MQSV:MQA:ON 100|0| 100|0| Version : B.3.1 100|0|MQSV:MQA:ON 100|0| 100|0| Version : B.3.1 100|0|MQSV:MQA:ON 100|0| SUCCESS : saMsgInitialize with all valid parameters 100|0| Return Value: SA_AIS_OK 100|0| Version : B.3.1 100|0| SUCCESS : saMsgInitialize with all valid parameters 100|0| Return Value: SA_AIS_OK 100|0| Message Handle : 6876704 100|0| Version Output : B.3.1 100|0| FAILED : saMsgFinalize with all valid parameters 100|0| Return Value: SA_AIS_ERR_TIMEOUT 3) --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets