[tickets] [opensaf:tickets] #242 cpsv : ckptnd crashed while running multi thread application during section iteration get next
- **Milestone**: 4.7-Tentative -- 4.5.2 --- ** [tickets:#242] cpsv : ckptnd crashed while running multi thread application during section iteration get next** **Status:** assigned **Milestone:** 4.5.2 **Created:** Thu May 16, 2013 06:31 AM UTC by A V Mahesh (AVM) **Last Updated:** Thu Aug 06, 2015 04:23 AM UTC **Owner:** A V Mahesh (AVM) **Attachments:** - [checkpoint_app1.c](https://sourceforge.net/p/opensaf/tickets/242/attachment/checkpoint_app1.c) (12.9 kB; application/octet-stream) from http://devel.opensaf.org/ticket/2864 The issue is seen on SLES 64bit VMs There are two threads in the application, a writer thread and a reader thread. Writer thread does the follows: 1) Creates the checkpoint 2) In a loop opens the same checkpoint in write mode, creates a section, writes into the section and closes the checkpoint Reader thread does as follows: 1) In a loop open the checkpoint created by writer thread, do a section iteration initialize and read the section returned by section descriptor of iterationNext() and close the checkpoint Bt observed: (gdb) bt #0 0x00417606 in cpnd_proc_fill_sec_desc (pTmpSecPtr=0x0, sec_des=0x7fffa9c28530) at cpnd_proc.c:1637 #1 0x00417b42 in cpnd_proc_getnext_section (cp_node=0x64a810, get_next=0x654bb0, sec_des=0x7fffa9c28530, n_secs_trav=0x7fffa9c2852c) at cpnd_proc.c:1756 #2 0x0040f680 in cpnd_evt_proc_ckpt_iter_getnext (cb=0x637f30, evt=0x654ba0, sinfo=0x6551f8) at cpnd_evt.c:4122 #3 0x004059df in cpnd_process_evt (evt=0x654b90) at cpnd_evt.c:241 #4 0x00411619 in cpnd_main_process (cb=0x637f30) at cpnd_init.c:544 #5 0x004118e3 in main (argc=1, argv=0x7fffa9c28e68) at cpnd_main.c:72 (gdb) fr 2 #2 0x0040f680 in cpnd_evt_proc_ckpt_iter_getnext (cb=0x637f30, evt=0x654ba0, sinfo=0x6551f8) at cpnd_evt.c:4122 4122 cpnd_evt.c: No such file or directory. in cpnd_evt.c (gdb) p *evt $1 = {dont_free_me = false, error = 0, type = CPND_EVT_A2ND_CKPT_ITER_GETNEXT, info = {initReq = {version = {releaseCode = 51 '3', majorVersion = 0 '\0', minorVersion = 0 '\0'}}, finReq = {client_hdl = 51}, openReq = {client_hdl = 51, lcl_ckpt_hdl = 11, ckpt_name = {length = 61664, value = d\000\000\000\000\000�\202a\000\000\000\000\000\005\000\000\000\t, '\0' repeats 236 times}, ckpt_attrib = {creationFlags = 0, checkpointSize = 0, retentionDuration = 0, maxSections = 0, maxSectionSize = 0, maxSectionIdSize = 0}, ckpt_flags = 0, invocation = 0, timeout = 0}, closeReq = {client_hdl = 51, ckpt_id = 11, ckpt_flags = 6615264}, ulinkReq = {ckpt_name = {length = 51, value = \000\000\000\000\000\000\v\000\000\000\000\000\000\000��d\000\000\000\000\000�\202a\000\000\000\000\000\005\000\000\000\t, '\0' repeats 220 times}}, rdsetReq = {ckpt_id = 51, reten_time = 11}, arsetReq = {ckpt_id = 51}, statReq = {ckpt_id = 51}, refCntsetReq = {no_of_nodes = 51, ref_cnt_array = {{ckpt_id = 11, ckpt_ref_cnt = 6615264}, {ckpt_id = 6390432, ckpt_ref_cnt = 5}, { ckpt_id = 0, ckpt_ref_cnt = 0} repeats 98 times}}, sec_creatReq = {ckpt_id = 51, lcl_ckpt_id = 11, agent_mdest = 6615264, sec_attri = {sectionId = 0x6182a0, expirationTime = 38654705669}, init_data = 0x0, init_size = 0}, sec_delReq = {ckpt_id = 51, sec_id = {idLen = 11, id = 0x64f0e0 section_4_1}, lcl_ckpt_id = 6390432, agent_mdest = 38654705669}, sec_expset = {ckpt_id = 51, sec_id = {idLen = 11, id = 0x64f0e0 section_4_1}, exp_time = 6390432}, iter_getnext = {ckpt_id = 51, section_id = {idLen = 11, id = 0x64f0e0 section_4_1}, iter_id = 6390432, filter = SA_CKPT_SECTIONS_ANY, n_secs_trav = 9, exp_tmr = 0}, arr_ntfy = { client_hdl = 51}, ckpt_write = {type = 51, ckpt_id = 11, lcl_ckpt_id = 6615264, agent_mdest = 6390432, num_of_elmts = 5, all_repl_evt_flag = 9, data = 0x0, seqno = 0, last_seq = 0 '\0', ckpt_sync = {ckpt_id = 0, lcl_ckpt_hdl = 0, client_hdl = 0, invocation = 0, cpa_sinfo = {to_svc = 0, dest = 0, stype = MDS_SENDTYPE_SND, ctxt = {length = 0 '\0', data = '\0' repeats 11 times}}, is_ckpt_open = false}}, ckpt_read = {type = 51, ckpt_id = 11, lcl_ckpt_id = 6615264, agent_mdest = 6390432, num_of_elmts = 5, all_repl_evt_flag = 9, data = 0x0, seqno = 0, last_seq = 0 '\0', ckpt_sync = {ckpt_id = 0, lcl_ckpt_hdl = 0, client_hdl = 0, invocation = 0, cpa_sinfo = {to_svc = 0, dest = 0, stype = MDS_SENDTYPE_SND, ctxt = { length = 0 '\0', data = '\0' repeats 11 times}}, is_ckpt_open = false}}, ckpt_sync = {ckpt_id = 51, lcl_ckpt_hdl = 11, client_hdl = 6615264, invocation = 6390432, cpa_sinfo = {to_svc = 5, dest = 0, stype = MDS_SENDTYPE_SND, ctxt = {length = 0 '\0', data = '\0' repeats 11 times}}, is_ckpt_open = false}, ckpt_read_ack = {ckpt_id = 51, mds_dest = 11}, ckpt_info = {error = 51, ckpt_id = 11, is_active_exists = 224, active_dest = 6390432, dest_cnt = 5, dest_list = 0x0, attributes = {creationFlags = 0, checkpointSize = 0, retentionDuration = 0, maxSections = 0, maxSectionSize = 0,
[tickets] [opensaf:tickets] #272 checkpoint overwrite returns timeout when controllers are running with different compatible versions
- **status**: unassigned -- assigned - **assigned_to**: A V Mahesh (AVM) - **Milestone**: 4.7-Tentative -- 4.5.2 --- ** [tickets:#272] checkpoint overwrite returns timeout when controllers are running with different compatible versions** **Status:** assigned **Milestone:** 4.5.2 **Created:** Fri May 17, 2013 11:40 AM UTC by Sirisha Alla **Last Updated:** Thu Aug 06, 2015 04:26 AM UTC **Owner:** A V Mahesh (AVM) **Attachments:** - [logs.tar.gz](https://sourceforge.net/p/opensaf/tickets/272/attachment/logs.tar.gz) (175.5 kB; application/x-gzip) The issue is seen on OEL6.4 TCP setup. Changeset being used is 4241 with patches 2794 and 3117. Active controller(SC-1) is running with 4.3 version while standby controller (SC-2) is running with cs3533(4.2.x) A non collocated checkpoint replica is created on Active controller. A section is created in the checkpoint. Write and Read APIs are successfull but overwrite API is returning timeout for 5 seconds after which application timesout and exits. No ckptnd and agent crashes observed. When the same application is run on SC-2, it runs without any error. Attaching the journal and the traces of ckptnd and ckptd on both the controllers. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #241 cpsv : saCkptCheckpointOpen writes to const SaNameT
- **status**: assigned -- unassigned --- ** [tickets:#241] cpsv : saCkptCheckpointOpen writes to const SaNameT** **Status:** unassigned **Milestone:** future **Created:** Thu May 16, 2013 06:28 AM UTC by A V Mahesh (AVM) **Last Updated:** Wed Jul 15, 2015 02:47 PM UTC **Owner:** A V Mahesh (AVM) from http://devel.opensaf.org/ticket/1731 Problem: osaf/libs/agents/saf/cpa/cpa_api.c line 648 : m_CPSV_SET_SANAMET(checkpointName); However, checkpointName is: const SaNameT *checkpointName and m_CPSV_SET_SANAMET does memset( (uns8 *)name-value[name-length], 0, (SA_MAX_NAME_LENGTH - name-length) ) This causes a segfault if the value passed in is in read-only memory. bug is present in opensaf-staging/1057c1e6ebba I'm not sure what version that is. Example: #define CKPT_NAME safCkpt=My_Ckpt,safApp=safCkptService const SaNameT ckpt_name = { sizeof(CKPT_NAME) - 1, CKPT_NAME }; Then call saCkptCheckpointOpen on it --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #265 mds : OpenSAF cannot start with mutex type PTHREAD_MUTEX_ERRORCHECK_NP
- **status**: assigned -- unassigned - **assigned_to**: A V Mahesh (AVM) -- nobody --- ** [tickets:#265] mds : OpenSAF cannot start with mutex type PTHREAD_MUTEX_ERRORCHECK_NP** **Status:** unassigned **Milestone:** future **Created:** Thu May 16, 2013 08:48 AM UTC by A V Mahesh (AVM) **Last Updated:** Wed Jul 15, 2015 02:43 PM UTC **Owner:** nobody http://devel.opensaf.org/ticket/759 In pursuit of the problem described in http://devel.opensaf.org/ticket/753 I changed the mutex type in the general code (ncs_os_lock in os_defs.c) to PTHREAD_MUTEX_ERRORCHECK_NP. Then I get recursive locking in MDS: opensaf-staging$ cat /var/lib/opensaf/stdouts/ncs_rde 35: Resource deadlock avoided ncs_rde: os_defs.c:783: ncs_os_lock: Assertion `0' failed. (gdb) bt #0 0x7f31603db4b5 in *GI_raise (sig=value optimized out) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x7f31603def50 in *GI_abort () at abort.c:92 #2 0x7f31603d4481 in *GI_assert_fail (assertion=0x7f3160b8dbf1 0, file=value optimized out, line=783, function=0x7f3160b8f4b4 ncs_os_lock) at assert.c:81 #3 0x7f3160b62313 in ncs_os_lock (lock=value optimized out, request=value optimized out, type=value optimized out) at os_defs.c:783 #4 0x7f3160b4c97c in ncs_spir_api (info=0x7fff8c182190) at ncs_sprr.c:360 #5 0x7f3160b88d2c in mda_lib_req (req=0x7fff8c182400) at ncs_mda.c:157 #6 0x7f3160b4cf77 in ncs_spir_api (info=0x7fff8c1826f0) at ncs_sprr.c:526 #7 0x7f3160b88ebb in mda_lib_req (req=0x7fff8c1829a0) at ncs_mda.c:105 #8 0x7f3160b4bab0 in ncs_mds_startup (argc=value optimized out, argv=0x7fff8c182b50) at ncs_main_pub.c:353 #9 0x7f3160b4c372 in ncs_core_agents_startup (argc=0, argv=0x7fff8c182b50) at ncs_main_pub.c:446 #10 0x7f3160b4c429 in ncs_agents_startup (argc=923, argv=0x39b) at ncs_main_pub.c:225 #11 0x00402d20 in rde_agents_startup () at rde_amf.c:425 #12 0x0040421f in main (argc=value optimized out, argv=0x7fff8c182fa8) at rde_main.c:122 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #266 mds : Error codes are not forwarded in ncsmds_api
- **status**: assigned -- unassigned - **assigned_to**: A V Mahesh (AVM) -- nobody --- ** [tickets:#266] mds : Error codes are not forwarded in ncsmds_api** **Status:** unassigned **Milestone:** future **Created:** Thu May 16, 2013 08:50 AM UTC by A V Mahesh (AVM) **Last Updated:** Wed Jul 15, 2015 02:43 PM UTC **Owner:** nobody http://devel.opensaf.org/ticket/2267 Return value from enc_full or flat for example is not forwarded all the way up to caller of the MDS API. rc = ncsmds_api(mds_info) For example invalid param from encoding could be useful to return back to the user. Now only NCSCC_RC_FAILURE will be returned back for all errors. This is pattern is on other places in MDS API also. What is the reason for not forward return codes? from mds_c_sndrcv.c status below is not forwarded: m_MDS_LOG_DBG(MDS_SND_RCV : calling cb ptr enc or enc flatin mcm_msg_encode_full_or_flat_and_send\n); status = svc_cb-cback_ptr(cbinfo); if (status != NCSCC_RC_SUCCESS) { m_MDS_LOG_ERR (MDS_SND_RCV: Encode callback of Dest =%d, Adest=%llx, svc-id=%d failed while sending to svc=%d), dest_vdest_id, adest, svc_cb-svc_id, to_svc_id); m_MDS_LOG_DBG(MDS_SND_RCV : Leaving mcm_msg_encode_full_or_flat_and_send\n); if (msg_send.msg.encoding == MDS_ENC_TYPE_FLAT) { m_MMGR_FREE_BUFR_LIST(msg_send.msg.data.flat_uba.start); } else if (msg_send.msg.encoding == MDS_ENC_TYPE_FULL) { m_MMGR_FREE_BUFR_LIST(msg_send.msg.data.fullenc_uba.start); } return NCSCC_RC_FAILURE; } --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1423 ckptnd doesn't handle fault case when creating share memory at start up
Currently, I'm busy with other stuff. I'll fix this in next release. --- ** [tickets:#1423] ckptnd doesn't handle fault case when creating share memory at start up** **Status:** assigned **Milestone:** future **Created:** Tue Jul 21, 2015 06:33 AM UTC by Pham Hoang Nhat **Last Updated:** Tue Aug 11, 2015 06:10 AM UTC **Owner:** Pham Hoang Nhat Observed behaviour -- When installing a campaign a test component, the ckptnd trigger a core dump. Error messages -- Following is the message in the syslog. Jun 17 07:50:41 SC-2-2 osafckptnd[11361]: ER cpnd open request fail for RDWR mode (null) Jun 17 07:50:51 SC-2-2 kernel: [ 494.474214] osafckptnd[11361]: segfault at 0 ip 7f25cd609608 sp 7fffdb6290b8 error 4 in libc-2.19.so[7f25cd57f000+19e000] Following is the bt: (gdb) bt #0 0x7fb733293608 in _wordcopy_fwd_dest_aligned () from /lib64/libc.so.6 #1 0x7fb73328db8a in __memmove_sse2 () from /lib64/libc.so.6 #2 0x7fb7343258cc in ncs_os_posix_shm (req=0x7fffe65e7090) at os_defs.c:836 #3 0x00415d1f in cpnd_find_free_loc () #4 0x00415f46 in cpnd_restart_shm_client_update () #5 0x00405a5b in cpnd_evt_proc_ckpt_init () #6 0x0040d532 in cpnd_process_evt () #7 0x0040e235 in cpnd_main_process () #8 0x0040edf7 in main () --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #68 failover didnot succeed and cluster got reset due to MDS problems.
- **status**: unassigned -- assigned - **assigned_to**: A V Mahesh (AVM) - **Type**: enhancement -- defect - **Milestone**: 4.7-Tentative -- 4.5.2 --- ** [tickets:#68] failover didnot succeed and cluster got reset due to MDS problems.** **Status:** assigned **Milestone:** 4.5.2 **Created:** Sat May 11, 2013 05:22 PM UTC by surender khetavath **Last Updated:** Fri Aug 07, 2015 04:19 AM UTC **Owner:** A V Mahesh (AVM) **Attachments:** - [logs.tgz](https://sourceforge.net/p/opensaf/tickets/68/attachment/logs.tgz) (16.2 MB; application/x-compressed-tar) Changeset : 4241 with 27943117 patch Model : TwoN configuration: 1App,1SG,4SUs with 3comps each and 5SIs with 3CSIs each Transport : TCP/ipv6-linklocal PBE enabled. scenario: sc1 was active and sc2 standby. Active SU on Sc1 was shutdown and component was made to reject quiescing assignment. Component got restarted for 10times as compRestartMax=10 and then escalated to nodefailover following a suFailover. sc-2 didnot become active, and eventually rebooted. Thus causing a cluster reset. syslog on sc-1: -- May 11 21:24:49 sc-1 osafimmnd[4683]: WA Error code 2 returned for message type 21 - ignoring May 11 21:24:49 sc-1 osafamfnd[4790]: NO Received reboot order, ordering reboot now! May 11 21:24:49 sc-1 osafamfnd[4790]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: Received reboot order May 11 21:24:49 sc-1 opensaf_reboot: Rebooting local node May 11 21:24:49 sc-1 osafimmnd[4683]: WA MESSAGE:5319 OUT OF ORDER my highest processed:5317, exiting May 11 21:24:49 sc-1 osafimmpbed: WA PBE lost contact with parent IMMND - Exiting May 11 21:24:49 sc-1 osafntfimcnd[4734]: ER saImmOiDispatch() Fail SA_AIS_ERR_BAD_HANDLE (9) May 11 21:24:49 sc-1 osafimmd[4668]: WA IMMND coordinator at 2010f apparently crashed = electing new coord May 11 21:24:49 sc-1 osafimmd[4668]: ER Failed to find candidate for new IMMND coordinator May 11 21:24:49 sc-1 osafimmd[4668]: ER Active IMMD has to restart the IMMSv. All IMMNDs will restart May 11 21:24:49 sc-1 osafimmd[4668]: ER IMM RELOAD = ensure cluster restart by IMMD exit at both SCs, exiting syslog on sc-2: May 11 21:24:49 sc-2 osafimmd[3894]: WA IMMD not re-electing coord for switch-over (si-swap) coord at (2010f) May 11 21:24:49 sc-2 osafntfimcnd[3969]: NO exiting on signal 15 May 11 21:24:49 sc-2 osafsmfd[4052]: ER amf_active_state_handler oi activate FAILED May 11 21:24:49 sc-2 osafamfnd[4023]: NO 'safComp=SMF,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'csiSetcallbackFailed' : Recovery is 'nodeFailfast' May 11 21:24:49 sc-2 osafamfnd[4023]: ER safComp=SMF,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:csiSetcallbackFailed Recovery is:nodeFailfast May 11 21:24:49 sc-2 osafamfnd[4023]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: Component faulted: recovery is node failfast May 11 21:24:49 sc-2 osafmsgd[4216]: ER mqd_imm_declare_implementer failed: err = 14 May 11 21:24:49 sc-2 osafckptd[4202]: ER cpd immOiImplmenterSet failed with err = 14 May 11 21:24:49 sc-2 opensaf_reboot: Rebooting local node --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1338 mds : Optimized the mds_library_mutex locks for better core readability
--- ** [tickets:#1338] mds : Optimized the mds_library_mutex locks for better core readability** **Status:** assigned **Milestone:** 4.7-Tentative **Created:** Fri Apr 24, 2015 05:16 AM UTC by A V Mahesh (AVM) **Last Updated:** Fri Aug 07, 2015 03:49 AM UTC **Owner:** A V Mahesh (AVM) Now in mds code mds_library_mutex unlock/lock was taken before and after function mds_mcm_time_wait() , and this is done acroos the code , if we move this mds_library_mutex unlock/lock in side the fucntion mds_mcm_time_wait() , code will have more readability and some code cleanup. Example changes : @@ -2435,9 +2438,7 @@ static uint32_t mcm_pvt_normal_svc_sndrs fr_svc_id, to_svc_id, to_dest); return status; } else { - osaf_mutex_unlock_ordie(gl_mds_library_mutex); if (NCSCC_RC_SUCCESS != mds_mcm_time_wait(sync_queue-sel_obj, req-info.sndrsp.i_time_to_wait)) { - osaf_mutex_lock_ordie(gl_mds_library_mutex); /* This is for response for local dest */ if (sync_queue-status == NCSCC_RC_SUCCESS) { /* sucess case */ @@ -2458,7 +2459,6 @@ static uint32_t mcm_pvt_normal_svc_sndrs mcm_pvt_del_sync_send_entry((MDS_PWE_HDL)env_hdl, fr_svc_id, xch_id, req-i_sendtype, 0); return NCSCC_RC_REQ_TIMOUT; } else { - osaf_mutex_lock_ordie(gl_mds_library_mutex); if (NCSCC_RC_SUCCESS != mds_check_for_mds_existence(sync_queue-sel_obj, env_hdl, fr_svc_id, to_svc_id)) { m_MDS_LOG_INFO(MDS_SND_RCV: MDS entry doesnt exist\n); @@ -2549,15 +2549,18 @@ static uint32_t mds_await_active_tbl_del static uint32_t mds_mcm_time_wait(NCS_SEL_OBJ *sel_obj, uint32_t time_val) { + osaf_mutex_unlock_ordie(gl_mds_library_mutex); /* Now wait for the response to come */ int count = osaf_poll_one_fd(sel_obj-rmv_obj, time_val == 0 ? -1 : (time_val * 10)); + osaf_mutex_lock_ordie(gl_mds_library_mutex); --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1317 ckpt : stale replicas observed in a 70 node cluster
- **status**: unassigned -- assigned - **assigned_to**: A V Mahesh (AVM) - **Milestone**: 4.4.2 -- 4.5.2 --- ** [tickets:#1317] ckpt : stale replicas observed in a 70 node cluster** **Status:** assigned **Milestone:** 4.5.2 **Created:** Wed Apr 15, 2015 10:16 AM UTC by Sirisha Alla **Last Updated:** Wed Apr 15, 2015 10:16 AM UTC **Owner:** A V Mahesh (AVM) **Attachments:** - [logs.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1317/attachment/logs.tar.bz2) (6.5 MB; application/x-bzip) This issue is observed on cs6377 (46FC Tag). The cluster is 0f 70 nodes and 2 checkpoint applications run on each node. The application running on the active controller creates the checkpoint, while the applications running on other nodes open the same checkpoint and use them. After sections are created, written and read from all the applications finalizes the handles used. The retention duration of the checkpoint is specified to a minimal value of 1000 nanoseconds. /dev/shm on the active controller after the applications exited. SLES-64BIT-SLOT1:~ # date;ls -lrt /dev/shm/ Wed Apr 15 14:25:09 IST 2015 total 1772 -rw-r--r-- 1 opensaf opensaf 1076040 Apr 15 13:38 opensaf_NCS_MQND_QUEUE_CKPT_INFO -rw-r--r-- 1 opensaf opensaf 328000 Apr 15 13:38 opensaf_NCS_GLND_RES_CKPT_INFO -rw-r--r-- 1 opensaf opensaf 16 Apr 15 13:38 opensaf_NCS_GLND_LCK_CKPT_INFO -rw-r--r-- 1 opensaf opensaf 88000 Apr 15 13:38 opensaf_NCS_GLND_EVT_CKPT_INFO -rw-r--r-- 1 opensaf opensaf 704008 Apr 15 13:38 opensaf_CPND_CHECKPOINT_INFO_131343 -rw-r--r-- 1 opensaf opensaf 79848 Apr 15 13:55 opensaf_safCkpt=active_replica_ckpt_name_1_sysgrou_131343_4 -rw-r--r-- 1 opensaf opensaf 79848 Apr 15 13:56 opensaf_safCkpt=active_replica_ckpt_name_1_sysgrou_131343_9 -rw-r--r-- 1 opensaf opensaf 79848 Apr 15 13:57 opensaf_safCkpt=active_replica_ckpt_name_1_sysgrou_131343_16 SLES-64BIT-SLOT1:~ # date;immfind|grep -i ckpt Wed Apr 15 14:25:11 IST 2015 safApp=safCkptService SLES-64BIT-SLOT1:~ # When the same checkpoint name is being tried created, checkpoint service is not creating a new replica in the shared memory. cpd,cpnd traces are attached. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1305 cpsv: non-collocated ckpts are not receiving track changes if physical replica doesn't exist
- **Milestone**: 4.7-Tentative -- 4.6.1 --- ** [tickets:#1305] cpsv: non-collocated ckpts are not receiving track changes if physical replica doesn't exist** **Status:** assigned **Milestone:** 4.6.1 **Created:** Tue Apr 07, 2015 09:03 AM UTC by A V Mahesh (AVM) **Last Updated:** Thu Aug 06, 2015 04:28 AM UTC **Owner:** A V Mahesh (AVM) The track changes through a callback are not being received for non-collocated Checkpoints, if physical replica doesn't exist on that particular controller/payload blade. For the non-collocated Checkpoints, OpenSAF Checkpoint Service will specify the location of the checkpoint replicas as per the following policy: If a non-collocated checkpoint is opened for the first time by an application residing on a payload blade, the replicas will be created on the local payload blade and both the system controller nodes. In this case, the replica residing on the payload blade is designated as active replica. If a non-collocated checkpoint is opened for the first time by an application residing on the system controller nodes, the replica will be created only on the system controller blade. In this case, this replica on a system controller node will act as the active replica. If another application opens the same checkpoint from a payload node, the checkpoint service will not create the replica on that node. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1285 MDS TCP: zero bytes recvd results in application exit
- **Milestone**: 4.7-Tentative -- 4.5.2 --- ** [tickets:#1285] MDS TCP: zero bytes recvd results in application exit** **Status:** assigned **Milestone:** 4.5.2 **Created:** Thu Mar 26, 2015 09:49 AM UTC by Girish **Last Updated:** Fri Aug 07, 2015 04:03 AM UTC **Owner:** A V Mahesh (AVM) sometimes application using opensaf exits with below message: Feb 20 15:24:59 fedvm1 RIB[28549]: MDTM:socket_recv() = 0, conn lost with dh server, exiting library err :Success Feb 20 15:24:59 fedvm1 osafamfnd[28263]: NO 'safSu=SU1,safSg=app-simplex,safApp=appos' component restart probation timer started (timeout: 40 ns) Feb 20 15:24:59 fedvm1 osafamfnd[28263]: NO Restarting a component of 'safSu=SU1,safSg=app-simplex,safApp=appos' (comp restart count: 1) Feb 20 15:24:59 fedvm1 osafamfnd[28263]: NO 'safComp=App,safSu=SU1,safSg=app-simplex,safApp=appos' faulted due to 'avaDown' : Recovery is 'componentRestart' Exits at location osaf/libs/core/mds/mds_dt_trans.c::mdtm_process_poll_recv_data_tcp recd_bytes = recv(tcp_cb-DBSRsock, tcp_cb-buffer, local_len_buf, 0); if (recd_bytes 0) { return; } else if (0 == recd_bytes) { syslog(LOG_ERR, MDTM:socket_recv() = %d, conn lost with dh server, exiting library err :%d len:%d, recd_bytes, errno, local_len_buf); close(tcp_cb-DBSRsock); exit(0); } else if (local_len_buf recd_bytes) { local_len_buf turns out be 0 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1442 log: unable to create new cfg/log files if openning files are corrupted
--- ** [tickets:#1442] log: unable to create new cfg/log files if openning files are corrupted** **Status:** unassigned **Milestone:** 4.5.2 **Created:** Tue Aug 11, 2015 07:20 AM UTC by Vu Minh Nguyen **Last Updated:** Tue Aug 11, 2015 07:20 AM UTC **Owner:** nobody When something wrong with opening cfg/log file (e.g: files on disk are deleted/moved), if there is any action that leads to create new cfg/log files, logsv will get failed to do that action as logsv sees it failed to rename the files (appending closed time to file names), then it ignores creating new ones. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1433 log: saflogger should use built-in default log file format in log server
- **status**: unassigned -- accepted - **assigned_to**: Vu Minh Nguyen --- ** [tickets:#1433] log: saflogger should use built-in default log file format in log server** **Status:** accepted **Milestone:** 5.0 **Created:** Wed Aug 05, 2015 06:33 AM UTC by Vu Minh Nguyen **Last Updated:** Wed Aug 05, 2015 06:33 AM UTC **Owner:** Vu Minh Nguyen Currently, saflogger tool uses its own defined log file format for application stream instead of one in log server. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #264 mds : Refactor MDS tests
- **status**: assigned -- unassigned - **assigned_to**: A V Mahesh (AVM) -- nobody --- ** [tickets:#264] mds : Refactor MDS tests** **Status:** unassigned **Milestone:** future **Created:** Thu May 16, 2013 08:27 AM UTC by A V Mahesh (AVM) **Last Updated:** Thu May 16, 2013 08:28 AM UTC **Owner:** nobody http://devel.opensaf.org/ticket/2848 MDS tests are designed for tetware. They should be ported to the simple unit test frame work in OpenSAF. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #263 mds : Suspicios comparison
- **status**: assigned -- unassigned - **assigned_to**: A V Mahesh (AVM) -- nobody --- ** [tickets:#263] mds : Suspicios comparison** **Status:** unassigned **Milestone:** future **Created:** Thu May 16, 2013 08:25 AM UTC by A V Mahesh (AVM) **Last Updated:** Wed Jul 15, 2015 02:44 PM UTC **Owner:** nobody http://devel.opensaf.org/ticket/2639 The following comparison is always true: (recv-snd_type != MDS_SENDTYPE_ACK) || (recv-snd_type != MDS_SENDTYPE_RACK) Was the intention perhaps: (recv-snd_type != MDS_SENDTYPE_ACK) (recv-snd_type != MDS_SENDTYPE_RACK) Code is found in file osaf/libs/core/mds/mds_c_sndrcv.c, line 4024: /* For the message loss indication */ if ((true == svccb-i_msg_loss_indication) ((recv-snd_type != MDS_SENDTYPE_ACK) || (recv-snd_type != MDS_SENDTYPE_RACK) )) { /* Get the subscription table result table function pointer */ MDS_SUBSCRIPTION_RESULTS_INFO *lcl_subtn_res = NULL; if ( NCSCC_RC_SUCCESS == mds_get_subtn_res_tbl_by_adest(recv-dest_svc_hdl, recv-src_svc_id, recv-src_vdest, recv-src_adest, lcl_subtn_res) ) { if (recv-src_seq_num != lcl_subtn_res-msg_rcv_cnt) { m_MDS_LOG_ERR (MDS_SND_RCV: msg loss detected, Src SVC=%d, Src vdest id= %d, Src adest=%llu, local svc id=%d msg num=%d, recvd cnt=%d\n, recv-src_svc_id, recv-src_vdest, recv-src_adest, svccb-svc_id, recv-src_seq_num, lcl_subtn_res-msg_rcv_cnt); mds_mcm_msg_loss(recv-dest_svc_hdl, recv-src_adest, recv-src_svc_id, recv-src_vdest); lcl_subtn_res-msg_rcv_cnt = recv-src_seq_num; lcl_subtn_res-msg_rcv_cnt++; } else { lcl_subtn_res-msg_rcv_cnt++; } } else { m_MDS_LOG_INFO(MDS_SND_RCV: msg loss enabled but no subcription exists\n); } } --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #258 mds: MDS should use TIPC importance
- **status**: assigned -- unassigned - **assigned_to**: A V Mahesh (AVM) -- nobody --- ** [tickets:#258] mds: MDS should use TIPC importance** **Status:** unassigned **Milestone:** future **Created:** Thu May 16, 2013 08:11 AM UTC by A V Mahesh (AVM) **Last Updated:** Thu May 16, 2013 08:14 AM UTC **Owner:** nobody http://devel.opensaf.org/ticket/1772 If the application is using TIPC as its cluster communication protocol, OpenSAF control signalling can be blocked by the application. By using TIPC importance OpenSAF together with a nicely behaving application can avoid this scenario. Map MDS priority to TIPC importance. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #238 cpsv : Write for asynchronous non collocated checkpoint returns SA_AIS_ERR_NOT_EXIST in some processes
- **Milestone**: 4.7-Tentative -- 4.5.2 --- ** [tickets:#238] cpsv : Write for asynchronous non collocated checkpoint returns SA_AIS_ERR_NOT_EXIST in some processes** **Status:** assigned **Milestone:** 4.5.2 **Created:** Thu May 16, 2013 06:17 AM UTC by A V Mahesh (AVM) **Last Updated:** Thu Aug 06, 2015 04:25 AM UTC **Owner:** A V Mahesh (AVM) From http://devel.opensaf.org/ticket/2384 Changeset : 3065 Setup: 70 node SLES11 VM setup. Problem Description: 70 processes are running the below test scenario with each node hosting a single process. 1) The application that is running on SC-1 opens a non-collocated checkpoint, creates a section in the checkpoint. 2) The rest of the applications creates the checkpoint and once the section create is successful on SC-1, writes into the same section. Some of the applications return SA_AIS_ERR_NOT_EXIST for write operation. Traces are not enabled on the setup, and /var/log/messages for both the controllers can be provided --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #239 cpsv : section create returns ERR_EXIST after few try agains on 70 node cluster
- **Milestone**: 4.7-Tentative -- 4.5.2 --- ** [tickets:#239] cpsv : section create returns ERR_EXIST after few try agains on 70 node cluster** **Status:** assigned **Milestone:** 4.5.2 **Created:** Thu May 16, 2013 06:19 AM UTC by A V Mahesh (AVM) **Last Updated:** Thu Aug 06, 2015 04:24 AM UTC **Owner:** A V Mahesh (AVM) From http://devel.opensaf.org/ticket/3042 This is seen on 70 SLES VM setup. One checkpoint application runs on each node. 1) Checkpoint Application on active controller creates an asynchronous collocated checkpoint. The applications on other nodes open the same checkpoint 2) Replica is set active on active controller and section is created 3) Section create API returns TRY_AGAIN few times and returns ERR_EXIST. When application gets try again, the section should not be created in the checkpoint. This is always not reproducible. snippet from test journal: 520|0 15 00130961 1 21| FAILED : Section 11 created in active colloc ckpt 520|0 15 00130961 1 22| Return Value : SA_AIS_ERR_TRY_AGAIN 520|0 15 00130961 1 23| 520|0 15 00130961 1 24| Try again count : 8 520|0 15 00130961 1 25| 520|0 15 00130961 1 26| FAILED : Section 11 created in active colloc ckpt 520|0 15 00130961 1 27| Return Value : SA_AIS_ERR_EXIST Attaching CPD and CPND traces of both the controllers --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1423 ckptnd doesn't handle fault case when creating share memory at start up
- **Type**: defect -- enhancement - **Comment**: Are you targeting this ticket for 4. 7 ? If so change the milestone to 4.7, else I will change it to enhancement. --- ** [tickets:#1423] ckptnd doesn't handle fault case when creating share memory at start up** **Status:** assigned **Milestone:** future **Created:** Tue Jul 21, 2015 06:33 AM UTC by Pham Hoang Nhat **Last Updated:** Tue Jul 21, 2015 06:35 AM UTC **Owner:** Pham Hoang Nhat Observed behaviour -- When installing a campaign a test component, the ckptnd trigger a core dump. Error messages -- Following is the message in the syslog. Jun 17 07:50:41 SC-2-2 osafckptnd[11361]: ER cpnd open request fail for RDWR mode (null) Jun 17 07:50:51 SC-2-2 kernel: [ 494.474214] osafckptnd[11361]: segfault at 0 ip 7f25cd609608 sp 7fffdb6290b8 error 4 in libc-2.19.so[7f25cd57f000+19e000] Following is the bt: (gdb) bt #0 0x7fb733293608 in _wordcopy_fwd_dest_aligned () from /lib64/libc.so.6 #1 0x7fb73328db8a in __memmove_sse2 () from /lib64/libc.so.6 #2 0x7fb7343258cc in ncs_os_posix_shm (req=0x7fffe65e7090) at os_defs.c:836 #3 0x00415d1f in cpnd_find_free_loc () #4 0x00415f46 in cpnd_restart_shm_client_update () #5 0x00405a5b in cpnd_evt_proc_ckpt_init () #6 0x0040d532 in cpnd_process_evt () #7 0x0040e235 in cpnd_main_process () #8 0x0040edf7 in main () --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1436 MDS (TCP transport) fragment gets dropped, not received on standby node
- **status**: unassigned -- assigned - **assigned_to**: A V Mahesh (AVM) --- ** [tickets:#1436] MDS (TCP transport) fragment gets dropped, not received on standby node** **Status:** assigned **Milestone:** 4.6.1 **Created:** Thu Aug 06, 2015 06:47 AM UTC by Girish **Last Updated:** Mon Aug 10, 2015 10:49 AM UTC **Owner:** A V Mahesh (AVM) **Attachments:** - [cpsv_test_app.c](https://sourceforge.net/p/opensaf/tickets/1436/attachment/cpsv_test_app.c) (8.5 kB; text/x-csrc) Opensaf version: 4.6 Linux: Standard Fedora 22 release, no additional patches required default wmem_max/rmem_max values default buffer sizes for MDS_SOCK_SND_RCV_BUF_SIZE and DTM_SOCK_SND_RCV_BUF_SIZE Active-standby model opensaf run as root user/group Steps: 1. start opensaf on node1 (active) and node2 (standby) 2. start ckpt_demo (modified application attached) on active node, ./ckpt_demo 1 3. wait till all the data is checkpointed 4. start ckpt_demo on standby node, ./ckpt_demo 0 Notice Error messages in mds.log: MDTM: Some stale message recd, hence dropping adest= My investigation is that one of the fragment is lost, active node sends - where as standby by node does not receive. mds log on standby: May 29 4:30:03.089974 8461 ERR| mdtm_process_poll_recv_data_tcp May 29 4:30:03.089995 8461 ERR|before mds_mdtm_process_recvdata fun-call 1, recd_bytes=1454, buff_toal_len=1454 May 29 4:30:03.090014 8461 ERR|MDTM: Recd message with Fragment Seqnum=18, frag_num=3049, from src_Tipc_id=0x0002020f:25826, pkt_type=35817 May 29 4:30:03.090032 8461 ERR|MDTM: Reassembling in FULL UB May 29 4:30:03.090174 8461 ERR|mdtm_process_recv_events_tcp: pollres=1 May 29 4:30:03.090198 8461 ERR|mdtm_process_recv_events_tcp: pfd[0].revents=1 May 29 4:30:03.090216 8461 ERR| mdtm_process_poll_recv_data_tcp May 29 4:30:03.090238 8461 ERR|before mds_mdtm_process_recvdata fun-call 1, recd_bytes=1454, buff_toal_len=1454 May 29 4:30:03.090257 8461 ERR|MDTM: Recd message with Fragment Seqnum=18, frag_num=3050, from src_Tipc_id=0x0002020f:25826, pkt_type=35818 May 29 4:30:03.090275 8461 ERR|MDTM: Reassembling in FULL UB May 29 4:30:03.090735 8461 ERR|mdtm_process_recv_events_tcp: pollres=1 May 29 4:30:03.090762 8461 ERR|mdtm_process_recv_events_tcp: pfd[0].revents=1 May 29 4:30:03.090780 8461 ERR| mdtm_process_poll_recv_data_tcp May 29 4:30:03.090801 8461 ERR|before mds_mdtm_process_recvdata fun-call 1, recd_bytes=1454, buff_toal_len=1454 May 29 4:30:03.090820 8461 ERR|MDTM: Recd message with Fragment Seqnum=18, frag_num=3051, from src_Tipc_id=0x0002020f:25826, pkt_type=35819 May 29 4:30:03.090838 8461 ERR|MDTM: Reassembling in FULL UB May 29 4:30:03.090978 8461 ERR|mdtm_process_recv_events_tcp: pollres=1 May 29 4:30:03.091028 8461 ERR|mdtm_process_recv_events_tcp: pfd[0].revents=1 May 29 4:30:03.091047 8461 ERR| mdtm_process_poll_recv_data_tcp May 29 4:30:03.091068 8461 ERR|before mds_mdtm_process_recvdata fun-call 1, recd_bytes=1454, buff_toal_len=1454 May 29 4:30:03.091087 8461 ERR|MDTM: Recd message with Fragment Seqnum=18, frag_num=3053, from src_Tipc_id=0x0002020f:25826, pkt_type=35821 May 29 4:30:03.091106 8461 ERR|MDTM: ERROR Frag recd is not next frag so dropping adest=0x0002020f64e2 May 29 4:30:03.091125 8461 ERR|mdtm_process_recv_events_tcp: pollres=1 May 29 4:30:03.091143 8461 ERR|mdtm_process_recv_events_tcp: pfd[0].revents=1 May 29 4:30:03.091160 8461 ERR| mdtm_process_poll_recv_data_tcp May 29 4:30:03.091180 8461 ERR|before mds_mdtm_process_recvdata fun-call 1, recd_bytes=1454, buff_toal_len=1454 May 29 4:30:03.091198 8461 ERR|MDTM: Recd message with Fragment Seqnum=18, frag_num=3054, from src_Tipc_id=0x0002020f:25826, pkt_type=35822 May 29 4:30:03.091216 8461 ERR|MDTM: Message is dropped as msg is out of seq TRANSPOR-ID=0x0002020f64e2 May 29 4:30:03.091235 8461 ERR|mdtm_process_recv_events_tcp: pollres=1 May 29 4:30:03.091283 8461 ERR|mdtm_process_recv_events_tcp: pfd[0].revents=1 May 29 4:30:03.091302 8461 ERR| mdtm_process_poll_recv_data_tcp mds log on active: May 29 4:29:36.021518 25826 ERR|before mds_mdtm_process_recvdata fun-call 1, recd_bytes=1454, buff_toal_len=1454 May 29 4:29:36.021537 25826 ERR|MDTM: Recd message with Fragment Seqnum=5, frag_num=3049, from src_Tipc_id=0x0002020f:25995, pkt_type=35817 May 29 4:29:36.021554 25826 ERR|MDTM: Reassembling in flat UB May 29 4:29:36.021702 25995 ERR|successfully sent message, send_len=1456 May 29 4:29:36.021729 25995 ERR|MDTM:2 Sending message with Service Seqno=4, Fragment Seqnum=5, frag_num=35818, TO Dest_Tipc_id=0x0002020f:25826 May 29 4:29:36.021778 25826 ERR|mdtm_process_recv_events_tcp: pollres=1 May 29 4:29:36.021800 25826 ERR|mdtm_process_recv_events_tcp: pfd[0].revents=1 May 29 4:29:36.021817 25826 ERR|
[tickets] [opensaf:tickets] #1440 Mds: application crashes with core dump in mds
- **status**: assigned -- review --- ** [tickets:#1440] Mds: application crashes with core dump in mds** **Status:** review **Milestone:** 4.5.2 **Created:** Mon Aug 10, 2015 08:35 AM UTC by Nagendra Kumar **Last Updated:** Mon Aug 10, 2015 08:39 AM UTC **Owner:** A V Mahesh (AVM) Application crashes in mds code at below location: mds_c_sndrcv.c, line no: 4047 : m_MDS_LOG_ERR(MDS_SND_RCV: msg loss detected, Src svc_id = %s(%d), Src vdest id= %d,\ Src Adest = %PRIu64, local svc_id = %s(%d) msg num=%d, recvd cnt=%d\n, ncsmds_svc_names[recv-src_svc_id], recv-src_vdest, recv-src_adest, ncsmds_svc_names[svccb-svc_id], recv-src_seq_num, lcl_subtn_res-msg_rcv_cnt); The reason is mismatch between arguements and desired outputs, the log require 8 outputs but only 6 parameters were passed in. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #343 ntf: Implement SAI-AIS-NTF-A.02.01
- **summary**: ntf: -- ntf: Implement SAI-AIS-NTF-A.02.01 --- ** [tickets:#343] ntf: Implement SAI-AIS-NTF-A.02.01** **Status:** unassigned **Milestone:** future **Created:** Mon May 27, 2013 08:33 AM UTC by Praveen **Last Updated:** Wed Jul 15, 2015 02:32 PM UTC **Owner:** nobody Migrated from http://devel.opensaf.org/ticket/680. New functions: saNtfVariableDataSizeGet SaNtfStaticSuppressionFilterSetCallcackT Changed functions: saNtfInitialize_2 * -due to suppression callbacks saNtfStateChangeNotificationFilter_2 saNtfStateChangeNotificationAllocateFilter_2 saNtfLocalizedMessageFree_2 -add ntfHandle saNtfNotificationUnsubscribe_2 -add ntfHandle saNtfNotificationReadInitialize_2 * saNtfCallbacksT_2 * -due to suppression callbacks •= changed in A.03.01 Admin API - IMM integration Notifications --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #467 checkpoint with COLLOCATED flag forcing to register for arrival callback
- **status**: unassigned -- assigned - **Milestone**: 4.7-Tentative -- 4.5.2 --- ** [tickets:#467] checkpoint with COLLOCATED flag forcing to register for arrival callback** **Status:** assigned **Milestone:** 4.5.2 **Created:** Mon Jun 24, 2013 06:36 AM UTC by A V Mahesh (AVM) **Last Updated:** Thu Aug 06, 2015 04:30 AM UTC **Owner:** A V Mahesh (AVM) am using opensaf 4.0.0 http://devel.opensaf.org/ticket/1866 I am running a simple Amf demo for counting which uses checkpoint. my checkpoint creation flags are : SA_CKPT_CHECKPOINT_COLLOCATED| SA_CKPT_WR_ALL_REPLICAS i tested it on a 2 node cluster(both target hardware and UML nodes). problem is that unless i register for arrivalcallback, my standby component is faulting. amf is reporting healthcheck timeout. i tested for SA_CKPT_CHECKPOINT_COLLOCATED| SA_CKPT_WR_ACTIVE_REPLICA also . I am facing facing same issue. If I remove the collocated flag, it works fine. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #722 payloads did not go for reboot when both the controllers rebooted
- **status**: unassigned -- assigned - **assigned_to**: A V Mahesh (AVM) - **Milestone**: 4.7-Tentative -- 4.5.2 --- ** [tickets:#722] payloads did not go for reboot when both the controllers rebooted** **Status:** assigned **Milestone:** 4.5.2 **Created:** Thu Jan 16, 2014 07:36 AM UTC by Sirisha Alla **Last Updated:** Fri Aug 07, 2015 04:24 AM UTC **Owner:** A V Mahesh (AVM) **Attachments:** - [payloadnoreboot.tar.bz2](https://sourceforge.net/p/opensaf/tickets/722/attachment/payloadnoreboot.tar.bz2) (765.1 kB; application/x-bzip) The issue is seen on changeset 4733 + patches of CLM corresponding to changesets of #220. Continuous failovers are happening when some api invocations of IMM application are ongoing. The IMMD has asserted on the new active which is reported in the ticket #721 When both controllers got rebooted, the payloads did not get rebooted. Instead the opensaf services are up and running. CLM shows that both the payloads are not part of cluster. When the payloads are restarted manually, they joined the cluster. PL-3 syslog: Jan 15 18:23:09 SLES-64BIT-SLOT3 osafimmnd[3550]: NO implementer for class 'testMA_verifyObjApplNoResponseModCallback_101' is released = class extent is UNSAFE Jan 15 18:23:59 SLES-64BIT-SLOT3 logger: Invoking failover from invoke_failover.sh Jan 15 18:24:01 SLES-64BIT-SLOT3 osafimmnd[3550]: WA DISCARD DUPLICATE FEVS message:92993 Jan 15 18:24:01 SLES-64BIT-SLOT3 osafimmnd[3550]: WA Error code 2 returned for message type 57 - ignoring Jan 15 18:24:01 SLES-64BIT-SLOT3 osafimmnd[3550]: WA DISCARD DUPLICATE FEVS message:92994 Jan 15 18:24:01 SLES-64BIT-SLOT3 osafimmnd[3550]: WA Error code 2 returned for message type 57 - ignoring Jan 15 18:24:01 SLES-64BIT-SLOT3 osafimmnd[3550]: WA Director Service in NOACTIVE state - fevs replies pending:1 fevs highest processed:92994 Jan 15 18:24:01 SLES-64BIT-SLOT3 osafimmnd[3550]: NO No IMMD service = cluster restart Jan 15 18:24:01 SLES-64BIT-SLOT3 osafamfnd[3572]: NO 'safComp=IMMND,safSu=PL-3,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'componentRestart' Jan 15 18:24:01 SLES-64BIT-SLOT3 osafimmnd[6827]: Started Jan 15 18:24:01 SLES-64BIT-SLOT3 osafimmnd[6827]: NO Persistent Back-End capability configured, Pbe file:imm.db (suffix may get added) Jan 15 18:24:07 SLES-64BIT-SLOT3 kernel: [ 6343.176901] TIPC: Resetting link 1.1.3:eth0-1.1.2:eth0, peer not responding Jan 15 18:24:07 SLES-64BIT-SLOT3 kernel: [ 6343.176911] TIPC: Lost link 1.1.3:eth0-1.1.2:eth0 on network plane A Jan 15 18:24:07 SLES-64BIT-SLOT3 kernel: [ 6343.176918] TIPC: Lost contact with 1.1.2 Jan 15 18:24:07 SLES-64BIT-SLOT3 kernel: [ 6343.256091] TIPC: Resetting link 1.1.3:eth0-1.1.1:eth0, peer not responding Jan 15 18:24:07 SLES-64BIT-SLOT3 kernel: [ 6343.256100] TIPC: Lost link 1.1.3:eth0-1.1.1:eth0 on network plane A Jan 15 18:24:07 SLES-64BIT-SLOT3 kernel: [ 6343.256106] TIPC: Lost contact with 1.1.1 Jan 15 18:24:25 SLES-64BIT-SLOT3 kernel: [ 6361.425537] TIPC: Established link 1.1.3:eth0-1.1.2:eth0 on network plane A Jan 15 18:24:27 SLES-64BIT-SLOT3 osafimmnd[6827]: NO SERVER STATE: IMM_SERVER_ANONYMOUS -- IMM_SERVER_CLUSTER_WAITING Jan 15 18:24:27 SLES-64BIT-SLOT3 osafimmnd[6827]: NO SERVER STATE: IMM_SERVER_CLUSTER_WAITING -- IMM_SERVER_LOADING_PENDING Jan 15 18:24:27 SLES-64BIT-SLOT3 osafimmnd[6827]: NO SERVER STATE: IMM_SERVER_LOADING_PENDING -- IMM_SERVER_LOADING_CLIENT Jan 15 18:24:29 SLES-64BIT-SLOT3 osafimmnd[6827]: NO ERR_BAD_HANDLE: Admin owner 1 does not exist Jan 15 18:24:36 SLES-64BIT-SLOT3 kernel: [ 6372.473240] TIPC: Established link 1.1.3:eth0-1.1.1:eth0 on network plane A Jan 15 18:24:39 SLES-64BIT-SLOT3 osafimmnd[6827]: NO ERR_BAD_HANDLE: Admin owner 2 does not exist Jan 15 18:24:39 SLES-64BIT-SLOT3 osafimmnd[6827]: NO NODE STATE- IMM_NODE_LOADING Jan 15 18:24:45 SLES-64BIT-SLOT3 osafimmnd[6827]: WA Number of objects in IMM is:5000 Jan 15 18:24:46 SLES-64BIT-SLOT3 osafimmnd[6827]: WA Number of objects in IMM is:6000 Jan 15 18:24:47 SLES-64BIT-SLOT3 osafimmnd[6827]: WA Number of objects in IMM is:7000 Jan 15 18:24:48 SLES-64BIT-SLOT3 osafimmnd[6827]: WA Number of objects in IMM is:8000 Jan 15 18:24:49 SLES-64BIT-SLOT3 osafimmnd[6827]: WA Number of objects in IMM is:9000 After both the controllers came up following is the status: SLES-64BIT-SLOT1:~ # immlist safNode=PL-3,safCluster=myClmCluster Name Type Value(s) safNodeSA_STRING_T safNode=PL-3 saClmNodeLockCallbackTimeout SA_TIME_T500 (0xba43b7400, Thu Jan 1 05:30:50 1970) saClmNodeIsMember SA_UINT32_T Empty saClmNodeInitialViewNumber SA_UINT64_T Empty saClmNodeIDSA_UINT32_T Empty saClmNodeEE
[tickets] [opensaf:tickets] #520 Mds: Tune MDS logging to minimal informative
- **Milestone**: 5.0 -- 4.7-Tentative --- ** [tickets:#520] Mds: Tune MDS logging to minimal informative ** **Status:** assigned **Milestone:** 4.7-Tentative **Created:** Thu Jul 25, 2013 01:19 PM UTC by hano **Last Updated:** Fri Mar 13, 2015 12:09 PM UTC **Owner:** A V Mahesh (AVM) Minimize the MDS logging to only in case of required so that it can not reach 1 Mb of log rotation range/size sooner . amfnd core dump is produced when amfnd main thread (10720) is waiting for a pthread mutex, gl_mds_library_mutex, which is held by the mds thread (10723). The amf watchdog detects this (no healthchecks received) and sends an abort signal to the amfnd. Holding a mutex during file operations in MDS is not correct and should be corrected. (HR50165) #0 0x7f7830d70294 in __lll_lock_wait () from /lib64/libpthread.so.0 (gdb) p gl_mds_library_mutex $1 = {__data = {__lock = 2, __count = 1, __owner = 10723, __nusers = 1, __kind = 1, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = ã ) , ' ' repeats 22 times, __align = 4294967298} (gdb) info thr Id Target Id Frame 4Thread 0x7f7832263b00 (LWP 10723) 0x7f783083e20d in write () from /lib64/libc.so.6 3Thread 0x7f7832283b00 (LWP 10722) 0x7f7830844f53 in select () from /lib64/libc.so.6 2Thread 0x7f7832243b00 (LWP 10724) 0x7f7830d7076d in read () from /lib64/libpthread.so.0 * 1Thread 0x7f7832286700 (LWP 10720) 0x7f7830d70294 in __lll_lock_wait () from /lib64/libpthread.so.0 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #249 mds : tipc Invalid read errors in mds
- **status**: assigned -- unassigned - **assigned_to**: A V Mahesh (AVM) -- nobody --- ** [tickets:#249] mds : tipc Invalid read errors in mds** **Status:** unassigned **Milestone:** future **Created:** Thu May 16, 2013 06:44 AM UTC by A V Mahesh (AVM) **Last Updated:** Wed Jul 15, 2015 02:45 PM UTC **Owner:** nobody **Attachments:** - [valgrind.log](https://sourceforge.net/p/opensaf/tickets/249/attachment/valgrind.log) (23.1 kB; application/octet-stream) from http://devel.opensaf.org/ticket/1820 We are using OpenSAF4.1 and while running valgrind on one of our applications we noticed that there are a bunch of invalid read errors that seem to arise in the MDS library code. Attached is the valgrind report of the same. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #250 mds : tipc Missing error Checking in mds_dt_tipc.c
- **status**: assigned -- unassigned - **assigned_to**: A V Mahesh (AVM) -- nobody --- ** [tickets:#250] mds : tipc Missing error Checking in mds_dt_tipc.c** **Status:** unassigned **Milestone:** future **Created:** Thu May 16, 2013 06:45 AM UTC by A V Mahesh (AVM) **Last Updated:** Wed Jul 15, 2015 02:45 PM UTC **Owner:** nobody http://devel.opensaf.org/ticket/574 sing error handling of calls to ncs_enc_init_space_pp() and ncs_encode_n_octets_in_uba() in mds_dt_tipc.c. This can cause a segmentation fault. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #253 mds : 1.5 sec wait added in RSP send causes problems in MDS clients
- **status**: assigned -- unassigned - **assigned_to**: A V Mahesh (AVM) -- nobody --- ** [tickets:#253] mds : 1.5 sec wait added in RSP send causes problems in MDS clients** **Status:** unassigned **Milestone:** future **Created:** Thu May 16, 2013 06:54 AM UTC by A V Mahesh (AVM) **Last Updated:** Wed Jul 15, 2015 02:44 PM UTC **Owner:** nobody from http://devel.opensaf.org/ticket/2825 Single threaded LOG server stalled waiting for file system for a longer time than 10 sec which is the sync tmo in the LOG library. This causes LOG clients (e.g. NTF server) to timeout and retry. This creates a backlog of outdated messages in the LOG server mailbox. When those eventually are handled, the 1.5 sec in MDS is added to each RSP send. Therefore the LOG server never catch up with received messages in the mailbox. The change introduced in #2611 introduced an unacceptable hidden delay when sending messages that can have consequences for any client with soft real time requirements. For example AMF HC timeouts. References: http://devel.opensaf.org/ticket/2611 http://list.opensaf.org/pipermail/devel/2012-April/022254.html Workaround: LOG server throws away rotten messages that are older than 10 sec. Proposed long term solution: MDS should buffer incoming data messages until the corresponding SVC up message is received and potentially delivered to the client. Replying to hafe: Single threaded LOG server stalled waiting for file system for a longer time than 10 sec which is the sync tmo in the LOG library. This causes LOG clients (e.g. NTF server) to timeout and retry. LOG service or any other service(like dtsv) that does disk i/o are prone to these situations. This creates a backlog of outdated messages in the LOG server mailbox. When those eventually are handled, the 1.5 sec in MDS is added to each RSP send. Therefore the LOG server never catch up with received messages in the mailbox. This is a case of a slow receiver. More in the next comment The change introduced in #2611 introduced an unacceptable hidden delay when sending messages that can have consequences for any client with soft real time requirements. For example AMF HC timeouts. I don't think that change(in MDS) can 'directly and always' result in making LOG a 'slow transmitter'! Because, the 1.5 seconds i believe is only when the MDS client startsup, like during a node bootup. Having said that, such services that are dependent on responses from external resources(modules) like disk i/o in this case, should be tuned to have generally bigger healthcheck timeouts. Surya, could you please comment on Hans' theory on the 1.5 seconds. References: http://devel.opensaf.org/ticket/2611 http://list.opensaf.org/pipermail/devel/2012-April/022254.html Workaround: LOG server throws away rotten messages that are older than 10 sec. Proposed long term solution: MDS should buffer incoming data messages until the corresponding SVC up message is received and potentially delivered to the client. Changed 8 months ago by mathi ¶ I mean, if we try to formulate and understand the problem If the problem is health check timeouts we should do the following •increase the timeout for healthcheck, and •if necessary, introduce a separate healthcheck thread. If the problem is about clients' receiving retry, then these situations would occur typically when the shared filesystem is/was undergoing a role change or is in the process of some heavy sync operation, etc. In such situations, returning TRY_AGAIN is a genuine way of handling such situations (typically these situations can occur only during an upgrade kind of scenario that might involve role change or when some fault at the disk level and not during normal lifecycle when the healthchecks.) If the problem is timeout that which is caused by the slow processing, then we could think of introducing some protocol between the LGA and LGS to improve the congestion, i mean i'm tending to think in this angle, the end solution may involve LGA, LGS or even MDS but i think the problems being describe here would have occurred even without the 2611 and as such 2611 cannot contribute much to this problem getting formulated in this ticket. Having said that, throwing away older messages shouldn't be a problem, but i'm trying to understand how could that improve the situation... Changed 7 months ago by nagendra ¶ ■owner changed from surya to nagendra ■status changed from new to accepted Changed 7 months ago by nagendra ¶ ■owner changed from nagendra to surya ■status changed from accepted to assigned Changed 7 months ago by surya ¶ ■status changed from assigned to accepted Changed 7 months ago by surya ¶ ■patch_waiting changed from no to yes Changed 7 months ago by mahesh ¶ Steps to test: 1)Pause osaflogd process (# kill -STOP osaflogd PID ) 2)Write to system stream using saflogger tool(#/usr/local/bin/saflogger -y Out of
[tickets] [opensaf:tickets] #1423 ckptnd doesn't handle fault case when creating share memory at start up
- **status**: assigned -- unassigned - **assigned_to**: Pham Hoang Nhat -- nobody --- ** [tickets:#1423] ckptnd doesn't handle fault case when creating share memory at start up** **Status:** unassigned **Milestone:** future **Created:** Tue Jul 21, 2015 06:33 AM UTC by Pham Hoang Nhat **Last Updated:** Tue Aug 11, 2015 06:36 AM UTC **Owner:** nobody Observed behaviour -- When installing a campaign a test component, the ckptnd trigger a core dump. Error messages -- Following is the message in the syslog. Jun 17 07:50:41 SC-2-2 osafckptnd[11361]: ER cpnd open request fail for RDWR mode (null) Jun 17 07:50:51 SC-2-2 kernel: [ 494.474214] osafckptnd[11361]: segfault at 0 ip 7f25cd609608 sp 7fffdb6290b8 error 4 in libc-2.19.so[7f25cd57f000+19e000] Following is the bt: (gdb) bt #0 0x7fb733293608 in _wordcopy_fwd_dest_aligned () from /lib64/libc.so.6 #1 0x7fb73328db8a in __memmove_sse2 () from /lib64/libc.so.6 #2 0x7fb7343258cc in ncs_os_posix_shm (req=0x7fffe65e7090) at os_defs.c:836 #3 0x00415d1f in cpnd_find_free_loc () #4 0x00415f46 in cpnd_restart_shm_client_update () #5 0x00405a5b in cpnd_evt_proc_ckpt_init () #6 0x0040d532 in cpnd_process_evt () #7 0x0040e235 in cpnd_main_process () #8 0x0040edf7 in main () --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1443 log: service is crashed if creating and deleting conf obj class continuously
--- ** [tickets:#1443] log: service is crashed if creating and deleting conf obj class continuously** **Status:** unassigned **Milestone:** 4.5.2 **Created:** Tue Aug 11, 2015 07:37 AM UTC by Vu Minh Nguyen **Last Updated:** Tue Aug 11, 2015 07:37 AM UTC **Owner:** nobody When creating application object class and deleting it continuously, log service could be crashed. To reproduce this case, perform following command. for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20; do immcfg -c SaLogStreamConfig safLgStrCfg=TestLog -a saLogStreamPathName=. -a saLogStreamFileName=TestLog; echo create ($i) - $?; immcfg -d safLgStrCfg=TestLog; echo Delete ($i) - $?; done Output something likes: create (1) - 0 Delete (1) - 0 create (2) - 0 error - saImmOmCcbObjectDelete for 'safLgStrCfg= TestLog' FAILED: SA_AIS_ERR_FAILED_OPERATION (21) error - saImmOmCcbApply FAILED: SA_AIS_ERR_FAILED_OPERATION (21) Delete (2) - 1 error - saImmOmCcbObjectCreate_2 FAILED with SA_AIS_ERR_EXIST (14) create (3) - 1 reboot: Restarting system Here is the analysis: 1. When creating obj class is done by IMM, but logsv have not finished the `apply callback` job yet. In this case, it needs to update a run-time attribute ` saLogStreamCreationTimestamp`. This is done in main thread. 2. If deleting this obj class comes before `apply callback` job finishes, IMM will mark that obj class as `IMM_DELETE_LOCK` and call respective callbacks to logsv and *wait for response*, but logsv is busy in doing `apply callback` in (1). When the request `update runtime attribute` to IMM by logsv, IMM will returns TRY_AGAIN. IMM waits for logsv response to release “IMM_DELETE_LOCK”, while logsv still get stuck in `update rt attribute` as getting TRY_AGAIN. Consequently, logsv might be terminated if number of try-again is reached or delete action gets failed. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1224 ckpd: enhanced trace log and check of user parameters
- **status**: assigned -- unassigned - **Milestone**: 5.0 -- future --- ** [tickets:#1224] ckpd: enhanced trace log and check of user parameters ** **Status:** unassigned **Milestone:** future **Created:** Tue Dec 02, 2014 06:32 AM UTC by Ingvar Bergström **Last Updated:** Thu Jul 09, 2015 02:55 AM UTC **Owner:** A V Mahesh (AVM) The checkpoint service shall provide better trace logging. Check of user parameters in library user interface shall be enhanced. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #787 Processes that use opensaf agents get killed if opensafd is stopped when TCP is the transport
- **Milestone**: 4.7-Tentative -- future --- ** [tickets:#787] Processes that use opensaf agents get killed if opensafd is stopped when TCP is the transport** **Status:** unassigned **Milestone:** future **Created:** Fri Feb 14, 2014 06:36 AM UTC by manu **Last Updated:** Fri Aug 07, 2015 04:15 AM UTC **Owner:** nobody This issue is seen when the transport is TCP. When Opensafd is stopped ,application process that are using services of opensaf exits. From 4.4 opensafd stop is equivalent to node down with the implementation of #220. Applications that are using OpenSAF should not exit when opensaf is stopped when the transport is TCP. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1123 AMF: delete of attributes related to ticket #819 is not working properly
- **Priority**: major -- minor --- ** [tickets:#1123] AMF: delete of attributes related to ticket #819 is not working properly** **Status:** assigned **Milestone:** 4.7-Tentative **Created:** Mon Sep 22, 2014 02:03 PM UTC by hano **Last Updated:** Wed Jul 15, 2015 01:20 PM UTC **Owner:** hano A problem with ticket #819 related to delete of attributes is when the information that an attribute has been deleted is not known at the amfnd side, amfd uses the value_is_deleted flag to update an attribute with e.g. the global attribute. This needs to be solved, either: 1) re-introduce the applier. 2) add a new field/variable to the AVSV_PARAM_INFO to e.g. indicate delete of an attribute value. There may be upgrade problems with this. 3) do all processing regarding changing base and inherited attributes in amfd. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #819 AMF: support immediate effect when changing comp/hc-type attributes
- **status**: review -- fixed --- ** [tickets:#819] AMF: support immediate effect when changing comp/hc-type attributes** **Status:** fixed **Milestone:** 5.0 **Created:** Tue Mar 25, 2014 10:16 AM UTC by Hans Feldt **Last Updated:** Tue Jan 20, 2015 08:28 AM UTC **Owner:** hano This is a continuation of ticket #539. Use case is changing for example a HC timeout and it should take effect immediately without need for restart. AMF should support the writable attributes of sutype, su, comptype, comp, hctype, hc and SaAmfCompGlobalAttributes. Basically everything related to the execution of component and error handling in the AMF node director. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1437 AMF: Enhance csi assignment/removal illustration in samples/amf/sa_aware/amf_demo
- **status**: unassigned -- assigned - **assigned_to**: Minh Hon Chau - **Milestone**: future -- 4.7-Tentative --- ** [tickets:#1437] AMF: Enhance csi assignment/removal illustration in samples/amf/sa_aware/amf_demo** **Status:** assigned **Milestone:** 4.7-Tentative **Created:** Thu Aug 06, 2015 07:11 AM UTC by Minh Hon Chau **Last Updated:** Thu Aug 06, 2015 07:11 AM UTC **Owner:** Minh Hon Chau There are 2 points in this enhancement ticket: 1. Currently after loading 2N amfdemo sample, if issue command amf-adm shutdown active su, amfdemo crashes with this error **Aug 6 15:04:54 PL-4 amf_demo[577]: saAmfHAStateGet FAILED - 7** It's due to the saAmfHAStateGet calling with null csiName (TARGET_ALL). As a sample app, this should be corrected to use saAmfHAStateGet() properly 2. The sample is not showing the csi ha state transition/life cycle (which it currently accepts all csi_assign and csi_remove callback without respect of csi existence). This illustration could be made for amf_demo sample to show how the states goes from STANDBY to ACTIVE, ACTIVE to QUIESCED,... It's also used to test whether AMF behaves correctly with its application in term of ha --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets