[tickets] [opensaf:tickets] #2582 amfnd: handle TIMEOUT in amf_saImmOmAccessorGet_o2

2017-11-06 Thread Gary Lee via Opensaf-tickets
- **status**: accepted --> not-reproducible



---

** [tickets:#2582] amfnd: handle TIMEOUT in amf_saImmOmAccessorGet_o2**

**Status:** not-reproducible
**Milestone:** 5.18.01
**Created:** Wed Sep 13, 2017 12:59 AM UTC by Gary Lee
**Last Updated:** Fri Nov 03, 2017 09:50 PM UTC
**Owner:** Gary Lee


Sometimes IMM returns TIMEOUT to AMFND when calling 
immutil_saImmOmAccessorGet_o2().

Aug 20 07:08:46 PL-4 osafamfnd[7326]: ER amf_saImmOmAccessorGet_o2 FAILED for 
'safComp=ABC,safSu=PL-4,safSg=NoRed-ABC,safApp=ABC'
Aug 20 07:08:46 PL-4 osafamfnd[7326]: NO Component CLC fsm exited with error 
for comp:safComp=ABC,safSu=PL-4,safSg=NoRed-ABC,safApp=ABC



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2670 log: log agent may crash after recovery fails in log api

2017-11-06 Thread Canh Truong via Opensaf-tickets



---

** [tickets:#2670] log: log agent may crash after recovery fails in log api**

**Status:** accepted
**Milestone:** 5.18.01
**Created:** Tue Nov 07, 2017 03:21 AM UTC by Canh Truong
**Last Updated:** Tue Nov 07, 2017 03:21 AM UTC
**Owner:** Canh Truong


In log api, the client is deleted from the list after recovery fails. But if 
there are other threads that call log api with the same client, the crash may 
happen.
*   if (client->isrecoveryfailed() == true) {
  ScopeLock criticalsection(getdeleteobjsyncmutex);
  RemoveLogClient(&client);
  aisrc = SAAISERRBADHANDLE;
  return aisrc;
}*


In saLogFinalize api, the client should remove from the list if 
client->is_client_initialized() == false


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1809 msg: APIs returning SA_AIS_OK even when cluster node has left cluster membership

2017-11-06 Thread Alex Jones via Opensaf-tickets
- **status**: unassigned --> fixed
- **Blocker**:  --> False
- **Comment**:

This has been fixed by ticket 2308.



---

** [tickets:#1809] msg: APIs returning SA_AIS_OK even when cluster node has 
left cluster membership**

**Status:** fixed
**Milestone:** future
**Created:** Thu May 05, 2016 09:52 AM UTC by Chani Srivastava
**Last Updated:** Tue Sep 20, 2016 05:58 PM UTC
**Owner:** nobody
**Attachments:**

- 
[mqsv_demo_app.c](https://sourceforge.net/p/opensaf/tickets/1809/attachment/mqsv_demo_app.c)
 (4.0 kB; text/x-c)


Setup:
Changeset- 7436
Version - opensaf 5.0
The issue is observed with last two releases also.

Issue: When a node is locked by clm service, saMsgQueueOpen() return SA_AIS_OK 
when it is should have returned SA_AIS_ERR_UNAVAILABLE according to the spec as 
mentioned blow:

If the cluster node has left the cluster membership or is being administratively
evicted from the cluster membership, the Message Service behaves as follows
towards processes residing on that node and using or attempting to use the 
service:
⇒ Calls to saMsgInitialize() will fail with SA_AIS_ERR_UNAVAILABLE.
⇒ All Message Service APIs that are invoked by the process and that operate on
handles already acquired by the process will fail with
SA_AIS_ERR_UNAVAILABLE with the following exceptions, assuming that the
handle msgHandle has already been acquired:

Step to Reproduce:
1. Compile and build the attached test code.
2. After Initialization, test sleep for 10 sec
3. During this sleep, lock the CLM node from another node in cluster
4. saMsgQueueOpen returns successfully.

Expected: saMsgQueueOpen should fail with rc = 31 


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2665 imm: update IMM documents with new changes

2017-11-06 Thread Zoran Milinkovic via Opensaf-tickets
- **status**: unassigned --> review
- **assigned_to**: Zoran Milinkovic
- **Milestone**: 5.18.01 --> 5.17.11
- **Comment**:

https://sourceforge.net/p/opensaf/mailman/message/36105476/



---

** [tickets:#2665] imm: update IMM documents with new changes**

**Status:** review
**Milestone:** 5.17.11
**Created:** Wed Nov 01, 2017 01:29 PM UTC by Zoran Milinkovic
**Last Updated:** Fri Nov 03, 2017 09:50 PM UTC
**Owner:** Zoran Milinkovic


Update README and IMMSv PR document


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2669 dtm: segv in osafdtmd

2017-11-06 Thread Hans Nordebäck via Opensaf-tickets



---

** [tickets:#2669] dtm: segv in osafdtmd**

**Status:** accepted
**Milestone:** 5.18.01
**Created:** Mon Nov 06, 2017 03:13 PM UTC by Hans Nordebäck
**Last Updated:** Mon Nov 06, 2017 03:13 PM UTC
**Owner:** Hans Nordebäck


In dtm_main.cc, function "main" the function "dtm_node_discovery_task_create()" 
is called before "dtm_service_discovery_init(dtms_cb)"
and it is the latter that initialize dtm_intranode_cb, so 
"dtm_node_discovery_task_create()", (which also is a real time thread), may be 
running with dtm_intranode_cb still not initialized leading to this segv. The 
init of dtm_intranode_cb has to be done before calling 
"dtm_node_discovery_task_create()".


Program terminated with signal SIGSEGV, Segmentation fault.
00:58:34 #0  ncs_ipc_send (mbx=0xd0, msg=msg@entry=0x7f1c8c010be0, 
prio=prio@entry=NCS_IPC_PRIORITY_HIGH) at src/base/sysf_ipc.c:535
00:58:34 [Current thread is 1 (Thread 0x7f1c9523db00 (LWP 154))]
00:58:34 
00:58:34 Thread 4 (Thread 0x7f1c9525db00 (LWP 153)):
00:58:34 #0  0x7f1c9433cb5d in poll () at 
../sysdeps/unix/syscall-template.S:84
00:58:34 No locals.
00:58:34 #1  0x7f1c94dee120 in poll (__timeout=-1, __nfds=1, 
__fds=0x7f1c9525d260) at /usr/include/x86_64-linux-gnu/bits/poll2.h:46
00:58:34 No locals.
00:58:34 #2  osaf_poll_no_timeout (io_fds=0x7f1c9525d260, i_nfds=1) at 
src/base/osaf_poll.c:31
00:58:34 No locals.
00:58:34 #3  0x7f1c94dee365 in osaf_ppoll 
(io_fds=io_fds@entry=0x7f1c9525d260, i_nfds=i_nfds@entry=1, i_timeout_ts=0x0, 
i_sigmask=i_sigmask@entry=0x0) at src/base/osaf_poll.c:82
00:58:34 start_time = {tv_sec = 139760738095744, tv_nsec = 0}
00:58:34 time_left_ts = 
00:58:34 result = -1792683424
00:58:34 #4  0x7f1c94df551f in ncs_tmr_wait () at src/base/sysf_tmr.c:463
00:58:34 rc = 
00:58:34 inds_rmvd = 
00:58:34 next_delay = 
00:58:34 tv = 
00:58:34 ts_current = {tv_sec = 410674, tv_nsec = 637832336}
00:58:34 ts = {tv_sec = 16777215, tv_nsec = 0}
00:58:34 set = {fd = 6, events = 1, revents = 0}
00:58:34 #5  0x7f1c946126ba in start_thread (arg=0x7f1c9525db00) at 
pthread_create.c:333
00:58:34 __res = 
00:58:34 pd = 0x7f1c9525db00
00:58:34 now = 
00:58:34 unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139760738097920, 
3941995848775809230, 1, 140734297406815, 139760738098624, 140734297408496, 
-3995288678512936754, -3995290365788505906}, mask_was_saved = 0}}, priv = {pad 
= {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
00:58:34 not_first_call = 
00:58:34 pagesize_m1 = 
00:58:34 sp = 
00:58:34 freesize = 
00:58:34 __PRETTY_FUNCTION__ = "start_thread"
00:58:34 #6  0x7f1c9434882d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:109
00:58:34 No locals.
00:58:34 
00:58:34 Thread 3 (Thread 0x7f1c95260740 (LWP 150)):
00:58:34 #0  __clock_nanosleep (clock_id=clock_id@entry=1, flags=flags@entry=1, 
req=req@entry=0x7fff41ce0390, rem=rem@entry=0x0) at 
../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
00:58:34 oldstate = 0
00:58:34 r = 
00:58:34 rem = 0x0
00:58:34 req = 0x7fff41ce0390
00:58:34 flags = 1
00:58:34 clock_id = 
00:58:34 #1  0x7f1c94def3b4 in osaf_nanosleep 
(sleep_duration=0x7fff41ce0430) at src/base/osaf_time.c:44
00:58:34 wakeup_time = {tv_sec = 410675, tv_nsec = 402517787}
00:58:34 retval = 
00:58:34 #2  0x555a767ecb85 in base::Sleep (duration=...) at 
./src/base/time.h:135
00:58:34 No locals.
00:58:34 #3  main (argc=, argv=) at 
src/dtm/dtmnd/dtm_main.cc:312
00:58:34 rc = 
00:58:34 dis_time_out_usec = 500
00:58:34 dis_elapsed_time_usec = 50
00:58:34 t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0}
00:58:34 __FUNCTION__ = "main"
00:58:34 dtms_cb = 0x555a76f16f60
00:58:34 
00:58:34 Thread 2 (Thread 0x7f1c9521db00 (LWP 157)):
00:58:34 #0  0x7f1c9433cb5d in poll () at 
../sysdeps/unix/syscall-template.S:84
00:58:34 No locals.
00:58:34 #1  0x555a767f5c32 in poll (__timeout=2, __nfds=, __fds=) at /usr/include/x86_64-linux-gnu/bits/poll2.h:46
00:58:34 No locals.
00:58:34 #2  dtm_intranode_processing () at src/dtm/dtmnd/dtm_intra.cc:670
00:58:34 poll_ret_val = 0
00:58:34 j = 
00:58:34 t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0}
00:58:34 __FUNCTION__ = "dtm_intranode_processing"
00:58:34 #3  0x7f1c946126ba in start_thread (arg=0x7f1c9521db00) at 
pthread_create.c:333
00:58:34 __res = 
00:58:34 pd = 0x7f1c9521db00
00:58:34 now = 
00:58:34 unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139760737835776, 
3941995848775809230, 1, 140734297407007, 139760737836480, 140734297408000, 
-3995288712872675122, -3995290365788505906}, mask_was_saved = 0}}, priv = {pad 
= {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype 

[tickets] [opensaf:tickets] #1242 LOG: Incorrect error return in filehandling function

2017-11-06 Thread elunlen via Opensaf-tickets
- **status**: accepted --> unassigned
- **assigned_to**: elunlen -->  nobody 
- **Blocker**:  --> False
- **Milestone**: 5.17.08 --> future



---

** [tickets:#1242] LOG: Incorrect error return in filehandling function**

**Status:** unassigned
**Milestone:** future
**Created:** Wed Jan 21, 2015 04:06 PM UTC by elunlen
**Last Updated:** Mon Apr 10, 2017 01:40 PM UTC
**Owner:** nobody


Function log_stream_config_change() in file lgs_stream.c always return error if 
called in mode create_files_f = false


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2647 ntfd: ntfimcnd crashed on handling for Object creation callback

2017-11-06 Thread Srinivas Siva Mangipudy via Opensaf-tickets
Hi Srinivas

If something that prevents ntfimcn to send notifications happen ntfimcn shall 
be restarted so that a possible missed notification message is sent. If this 
may happen in “normal” situations ntfimcn shall just exit without any coredump. 
Possibly a notification (LOG_NO) should be written to the syslog. If the 
problem is “not normal”, something that should never happen (an error that must 
be analyzed and maybe fixed) an abort is better since it gives us a back-trace 
of what has happened. Before any abort is done an error message shall be 
written to syslog (LOG_ER). This error message should contain information about 
what has happened and where it has happened (__FUNCTION__, __LINE__) in order 
to have at least some information if a coredump is lost, could not be 
genereated, no back-trace was created etc.

So, the question is, is this something that could happen in a “real” system and 
is it of any interest to get a core-dump to analyze the problem? In any case 
ntfimcn will recover.
If the answer is that this event triggering the core-dump is not a fail to 
analyze and that it is something ntfimcn should just gracefully recover from 
then change to LOG_NO + _Exit.

Regards
Lennart

From: Srinivas Mangipudy [mailto:srinivas.mangip...@oracle.com] 
Sent: den 2 november 2017 16:42
To: Lennart Lund ; Minh Hon Chau 

Cc: Ravi Sekhar Reddy Konda 
Subject: RE: OSAF : ntfimcnd core dump issue -- 2647.

Hi Lennart,

Thanks a lot for the explanation.
It was really helpful.

Do you think it is better to log the error and then call _exit  ( we will have 
a graceful restart)  then calling abort and allowing the process to dump core 
in this case?
Please let me know your thoughts about this.

Thanks and best regards
Srinivas


From: Lennart Lund [mailto:lennart.l...@ericsson.com] 
Sent: Thursday, November 2, 2017 6:26 PM
To: Minh Hon Chau ; Srinivas Mangipudy 

Cc: Ravi Sekhar Reddy Konda ; Lennart Lund 

Subject: RE: OSAF : ntfimcnd core dump issue -- 2647.

Hi

Ntfimcn implements a so called “special applier”. This is an IMM applier that 
receives IMM callbacks in the same way as an object implementer or an ordinary 
applier. The difference is that a special applier is not requesting to become 
applier for any special objects or classes. Instead a configuration attribute 
in an object can be given a flag, ATTR_NOTIFY.
The general error handling in ntfimcn is to exit. This will be detected by the 
osafntfd process that then restarts osfafntfimcnd. To notify that this has 
happened (ntfimcn may have “missed” some IMM modifications while it was down) 
ntfimcn always sends a special notification (may have lost notifications 
notification) when it is started. This notification is used by com-sa to 
request com to do a (re)synchronization.
The “normal” way for ntfimcn to exit is by calling _Exit. When something 
happens that should never happen abort() is used instead (a coredump will be 
created).
If something not normal and really bad is done in the system an abort is 
motivated. This means that ntfimcn shall not be made to “defensively” avoid to 
abort (coredump) if the cluster system and IMM is abused. Instead if we want to 
test something like this it shall be accepted that a coredump is created, this 
is not a Fail! However if something like that ntfimcn is not restarted or that 
no “may have lost notification notification” is sent it is to be considered as 
a Fail.
Note: If ntfimcn exist (or abort) the node or ntf as such is not affected. It 
is only the ntfimcn process that is restarted.

Ticket #2647 should probably not be fixed instead it should be set to invalid.

/Lennart

From: Minh Hon Chau [mailto:minh.c...@dektech.com.au] 
Sent: den 2 november 2017 05:45
To: Srinivas Mangipudy 
Cc: Ravi Sekhar Reddy Konda ; Lennart Lund 

Subject: Re: OSAF : ntfimcnd core dump issue -- 2647.

Hi Srinivas,
+ Lennart.
If I understand correctly, the test creates a huge amount of objects (are they 
RT or Config?), and while the callbacks are coming, the test deletes the class. 
The latter callbacks can't find class name so it aborts.
I think we can defensively avoid coredump and not sending notification as your 
suggestion, but I'm wondering the integrity from IMM, as IMM user receives a 
callback but the associated class is not existed.
Thanks,
Minh

On 01/11/17 22:04, Srinivas Mangipudy wrote:
Hi Minh,
 
This is regarding Notification issue 
https://sourceforge.net/p/opensaf/tickets/2647/.
 
This issue is occurring since IMM deleted the objects and ntfimcnd was not able 
to fetch the object, so it returned back “SA_AIS_ERR_NOT_EXIST” error.
Since “SA_AIS_ERR_NOT_EXIST” was returned,  ntfimcnd aborted, leading to core 
dump.
 
I have fixed the core dump, but I have a question regarding the notification to 
be sent.
I think in this case ntfimcnd should not be sending the notification at all, 
since it could not retrieve the class details.
Is that fix fine? Or do you suggest something else? Please let me know you