[tickets] [opensaf:tickets] #2582 amfnd: handle TIMEOUT in amf_saImmOmAccessorGet_o2
- **status**: accepted --> not-reproducible --- ** [tickets:#2582] amfnd: handle TIMEOUT in amf_saImmOmAccessorGet_o2** **Status:** not-reproducible **Milestone:** 5.18.01 **Created:** Wed Sep 13, 2017 12:59 AM UTC by Gary Lee **Last Updated:** Fri Nov 03, 2017 09:50 PM UTC **Owner:** Gary Lee Sometimes IMM returns TIMEOUT to AMFND when calling immutil_saImmOmAccessorGet_o2(). Aug 20 07:08:46 PL-4 osafamfnd[7326]: ER amf_saImmOmAccessorGet_o2 FAILED for 'safComp=ABC,safSu=PL-4,safSg=NoRed-ABC,safApp=ABC' Aug 20 07:08:46 PL-4 osafamfnd[7326]: NO Component CLC fsm exited with error for comp:safComp=ABC,safSu=PL-4,safSg=NoRed-ABC,safApp=ABC --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2670 log: log agent may crash after recovery fails in log api
--- ** [tickets:#2670] log: log agent may crash after recovery fails in log api** **Status:** accepted **Milestone:** 5.18.01 **Created:** Tue Nov 07, 2017 03:21 AM UTC by Canh Truong **Last Updated:** Tue Nov 07, 2017 03:21 AM UTC **Owner:** Canh Truong In log api, the client is deleted from the list after recovery fails. But if there are other threads that call log api with the same client, the crash may happen. * if (client->isrecoveryfailed() == true) { ScopeLock criticalsection(getdeleteobjsyncmutex); RemoveLogClient(&client); aisrc = SAAISERRBADHANDLE; return aisrc; }* In saLogFinalize api, the client should remove from the list if client->is_client_initialized() == false --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1809 msg: APIs returning SA_AIS_OK even when cluster node has left cluster membership
- **status**: unassigned --> fixed - **Blocker**: --> False - **Comment**: This has been fixed by ticket 2308. --- ** [tickets:#1809] msg: APIs returning SA_AIS_OK even when cluster node has left cluster membership** **Status:** fixed **Milestone:** future **Created:** Thu May 05, 2016 09:52 AM UTC by Chani Srivastava **Last Updated:** Tue Sep 20, 2016 05:58 PM UTC **Owner:** nobody **Attachments:** - [mqsv_demo_app.c](https://sourceforge.net/p/opensaf/tickets/1809/attachment/mqsv_demo_app.c) (4.0 kB; text/x-c) Setup: Changeset- 7436 Version - opensaf 5.0 The issue is observed with last two releases also. Issue: When a node is locked by clm service, saMsgQueueOpen() return SA_AIS_OK when it is should have returned SA_AIS_ERR_UNAVAILABLE according to the spec as mentioned blow: If the cluster node has left the cluster membership or is being administratively evicted from the cluster membership, the Message Service behaves as follows towards processes residing on that node and using or attempting to use the service: ⇒ Calls to saMsgInitialize() will fail with SA_AIS_ERR_UNAVAILABLE. ⇒ All Message Service APIs that are invoked by the process and that operate on handles already acquired by the process will fail with SA_AIS_ERR_UNAVAILABLE with the following exceptions, assuming that the handle msgHandle has already been acquired: Step to Reproduce: 1. Compile and build the attached test code. 2. After Initialization, test sleep for 10 sec 3. During this sleep, lock the CLM node from another node in cluster 4. saMsgQueueOpen returns successfully. Expected: saMsgQueueOpen should fail with rc = 31 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2665 imm: update IMM documents with new changes
- **status**: unassigned --> review - **assigned_to**: Zoran Milinkovic - **Milestone**: 5.18.01 --> 5.17.11 - **Comment**: https://sourceforge.net/p/opensaf/mailman/message/36105476/ --- ** [tickets:#2665] imm: update IMM documents with new changes** **Status:** review **Milestone:** 5.17.11 **Created:** Wed Nov 01, 2017 01:29 PM UTC by Zoran Milinkovic **Last Updated:** Fri Nov 03, 2017 09:50 PM UTC **Owner:** Zoran Milinkovic Update README and IMMSv PR document --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2669 dtm: segv in osafdtmd
--- ** [tickets:#2669] dtm: segv in osafdtmd** **Status:** accepted **Milestone:** 5.18.01 **Created:** Mon Nov 06, 2017 03:13 PM UTC by Hans Nordebäck **Last Updated:** Mon Nov 06, 2017 03:13 PM UTC **Owner:** Hans Nordebäck In dtm_main.cc, function "main" the function "dtm_node_discovery_task_create()" is called before "dtm_service_discovery_init(dtms_cb)" and it is the latter that initialize dtm_intranode_cb, so "dtm_node_discovery_task_create()", (which also is a real time thread), may be running with dtm_intranode_cb still not initialized leading to this segv. The init of dtm_intranode_cb has to be done before calling "dtm_node_discovery_task_create()". Program terminated with signal SIGSEGV, Segmentation fault. 00:58:34 #0 ncs_ipc_send (mbx=0xd0, msg=msg@entry=0x7f1c8c010be0, prio=prio@entry=NCS_IPC_PRIORITY_HIGH) at src/base/sysf_ipc.c:535 00:58:34 [Current thread is 1 (Thread 0x7f1c9523db00 (LWP 154))] 00:58:34 00:58:34 Thread 4 (Thread 0x7f1c9525db00 (LWP 153)): 00:58:34 #0 0x7f1c9433cb5d in poll () at ../sysdeps/unix/syscall-template.S:84 00:58:34 No locals. 00:58:34 #1 0x7f1c94dee120 in poll (__timeout=-1, __nfds=1, __fds=0x7f1c9525d260) at /usr/include/x86_64-linux-gnu/bits/poll2.h:46 00:58:34 No locals. 00:58:34 #2 osaf_poll_no_timeout (io_fds=0x7f1c9525d260, i_nfds=1) at src/base/osaf_poll.c:31 00:58:34 No locals. 00:58:34 #3 0x7f1c94dee365 in osaf_ppoll (io_fds=io_fds@entry=0x7f1c9525d260, i_nfds=i_nfds@entry=1, i_timeout_ts=0x0, i_sigmask=i_sigmask@entry=0x0) at src/base/osaf_poll.c:82 00:58:34 start_time = {tv_sec = 139760738095744, tv_nsec = 0} 00:58:34 time_left_ts = 00:58:34 result = -1792683424 00:58:34 #4 0x7f1c94df551f in ncs_tmr_wait () at src/base/sysf_tmr.c:463 00:58:34 rc = 00:58:34 inds_rmvd = 00:58:34 next_delay = 00:58:34 tv = 00:58:34 ts_current = {tv_sec = 410674, tv_nsec = 637832336} 00:58:34 ts = {tv_sec = 16777215, tv_nsec = 0} 00:58:34 set = {fd = 6, events = 1, revents = 0} 00:58:34 #5 0x7f1c946126ba in start_thread (arg=0x7f1c9525db00) at pthread_create.c:333 00:58:34 __res = 00:58:34 pd = 0x7f1c9525db00 00:58:34 now = 00:58:34 unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139760738097920, 3941995848775809230, 1, 140734297406815, 139760738098624, 140734297408496, -3995288678512936754, -3995290365788505906}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} 00:58:34 not_first_call = 00:58:34 pagesize_m1 = 00:58:34 sp = 00:58:34 freesize = 00:58:34 __PRETTY_FUNCTION__ = "start_thread" 00:58:34 #6 0x7f1c9434882d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 00:58:34 No locals. 00:58:34 00:58:34 Thread 3 (Thread 0x7f1c95260740 (LWP 150)): 00:58:34 #0 __clock_nanosleep (clock_id=clock_id@entry=1, flags=flags@entry=1, req=req@entry=0x7fff41ce0390, rem=rem@entry=0x0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48 00:58:34 oldstate = 0 00:58:34 r = 00:58:34 rem = 0x0 00:58:34 req = 0x7fff41ce0390 00:58:34 flags = 1 00:58:34 clock_id = 00:58:34 #1 0x7f1c94def3b4 in osaf_nanosleep (sleep_duration=0x7fff41ce0430) at src/base/osaf_time.c:44 00:58:34 wakeup_time = {tv_sec = 410675, tv_nsec = 402517787} 00:58:34 retval = 00:58:34 #2 0x555a767ecb85 in base::Sleep (duration=...) at ./src/base/time.h:135 00:58:34 No locals. 00:58:34 #3 main (argc=, argv=) at src/dtm/dtmnd/dtm_main.cc:312 00:58:34 rc = 00:58:34 dis_time_out_usec = 500 00:58:34 dis_elapsed_time_usec = 50 00:58:34 t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0} 00:58:34 __FUNCTION__ = "main" 00:58:34 dtms_cb = 0x555a76f16f60 00:58:34 00:58:34 Thread 2 (Thread 0x7f1c9521db00 (LWP 157)): 00:58:34 #0 0x7f1c9433cb5d in poll () at ../sysdeps/unix/syscall-template.S:84 00:58:34 No locals. 00:58:34 #1 0x555a767f5c32 in poll (__timeout=2, __nfds=, __fds=) at /usr/include/x86_64-linux-gnu/bits/poll2.h:46 00:58:34 No locals. 00:58:34 #2 dtm_intranode_processing () at src/dtm/dtmnd/dtm_intra.cc:670 00:58:34 poll_ret_val = 0 00:58:34 j = 00:58:34 t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0} 00:58:34 __FUNCTION__ = "dtm_intranode_processing" 00:58:34 #3 0x7f1c946126ba in start_thread (arg=0x7f1c9521db00) at pthread_create.c:333 00:58:34 __res = 00:58:34 pd = 0x7f1c9521db00 00:58:34 now = 00:58:34 unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139760737835776, 3941995848775809230, 1, 140734297407007, 139760737836480, 140734297408000, -3995288712872675122, -3995290365788505906}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype
[tickets] [opensaf:tickets] #1242 LOG: Incorrect error return in filehandling function
- **status**: accepted --> unassigned - **assigned_to**: elunlen --> nobody - **Blocker**: --> False - **Milestone**: 5.17.08 --> future --- ** [tickets:#1242] LOG: Incorrect error return in filehandling function** **Status:** unassigned **Milestone:** future **Created:** Wed Jan 21, 2015 04:06 PM UTC by elunlen **Last Updated:** Mon Apr 10, 2017 01:40 PM UTC **Owner:** nobody Function log_stream_config_change() in file lgs_stream.c always return error if called in mode create_files_f = false --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2647 ntfd: ntfimcnd crashed on handling for Object creation callback
Hi Srinivas If something that prevents ntfimcn to send notifications happen ntfimcn shall be restarted so that a possible missed notification message is sent. If this may happen in “normal” situations ntfimcn shall just exit without any coredump. Possibly a notification (LOG_NO) should be written to the syslog. If the problem is “not normal”, something that should never happen (an error that must be analyzed and maybe fixed) an abort is better since it gives us a back-trace of what has happened. Before any abort is done an error message shall be written to syslog (LOG_ER). This error message should contain information about what has happened and where it has happened (__FUNCTION__, __LINE__) in order to have at least some information if a coredump is lost, could not be genereated, no back-trace was created etc. So, the question is, is this something that could happen in a “real” system and is it of any interest to get a core-dump to analyze the problem? In any case ntfimcn will recover. If the answer is that this event triggering the core-dump is not a fail to analyze and that it is something ntfimcn should just gracefully recover from then change to LOG_NO + _Exit. Regards Lennart From: Srinivas Mangipudy [mailto:srinivas.mangip...@oracle.com] Sent: den 2 november 2017 16:42 To: Lennart Lund ; Minh Hon Chau Cc: Ravi Sekhar Reddy Konda Subject: RE: OSAF : ntfimcnd core dump issue -- 2647. Hi Lennart, Thanks a lot for the explanation. It was really helpful. Do you think it is better to log the error and then call _exit ( we will have a graceful restart) then calling abort and allowing the process to dump core in this case? Please let me know your thoughts about this. Thanks and best regards Srinivas From: Lennart Lund [mailto:lennart.l...@ericsson.com] Sent: Thursday, November 2, 2017 6:26 PM To: Minh Hon Chau ; Srinivas Mangipudy Cc: Ravi Sekhar Reddy Konda ; Lennart Lund Subject: RE: OSAF : ntfimcnd core dump issue -- 2647. Hi Ntfimcn implements a so called “special applier”. This is an IMM applier that receives IMM callbacks in the same way as an object implementer or an ordinary applier. The difference is that a special applier is not requesting to become applier for any special objects or classes. Instead a configuration attribute in an object can be given a flag, ATTR_NOTIFY. The general error handling in ntfimcn is to exit. This will be detected by the osafntfd process that then restarts osfafntfimcnd. To notify that this has happened (ntfimcn may have “missed” some IMM modifications while it was down) ntfimcn always sends a special notification (may have lost notifications notification) when it is started. This notification is used by com-sa to request com to do a (re)synchronization. The “normal” way for ntfimcn to exit is by calling _Exit. When something happens that should never happen abort() is used instead (a coredump will be created). If something not normal and really bad is done in the system an abort is motivated. This means that ntfimcn shall not be made to “defensively” avoid to abort (coredump) if the cluster system and IMM is abused. Instead if we want to test something like this it shall be accepted that a coredump is created, this is not a Fail! However if something like that ntfimcn is not restarted or that no “may have lost notification notification” is sent it is to be considered as a Fail. Note: If ntfimcn exist (or abort) the node or ntf as such is not affected. It is only the ntfimcn process that is restarted. Ticket #2647 should probably not be fixed instead it should be set to invalid. /Lennart From: Minh Hon Chau [mailto:minh.c...@dektech.com.au] Sent: den 2 november 2017 05:45 To: Srinivas Mangipudy Cc: Ravi Sekhar Reddy Konda ; Lennart Lund Subject: Re: OSAF : ntfimcnd core dump issue -- 2647. Hi Srinivas, + Lennart. If I understand correctly, the test creates a huge amount of objects (are they RT or Config?), and while the callbacks are coming, the test deletes the class. The latter callbacks can't find class name so it aborts. I think we can defensively avoid coredump and not sending notification as your suggestion, but I'm wondering the integrity from IMM, as IMM user receives a callback but the associated class is not existed. Thanks, Minh On 01/11/17 22:04, Srinivas Mangipudy wrote: Hi Minh, This is regarding Notification issue https://sourceforge.net/p/opensaf/tickets/2647/. This issue is occurring since IMM deleted the objects and ntfimcnd was not able to fetch the object, so it returned back “SA_AIS_ERR_NOT_EXIST” error. Since “SA_AIS_ERR_NOT_EXIST” was returned, ntfimcnd aborted, leading to core dump. I have fixed the core dump, but I have a question regarding the notification to be sent. I think in this case ntfimcnd should not be sending the notification at all, since it could not retrieve the class details. Is that fix fine? Or do you suggest something else? Please let me know you