[tickets] [opensaf:tickets] #2142 imm: Compile the IMM library using the C++ compiler

2016-12-12 Thread Hung Nguyen
- **status**: review --> fixed
- **Comment**:

default (5.2)
[staging:3bd4e5]
changeset:   8432:3bd4e5b7a96d
user:Hung Nguyen 
date:Wed Nov 02 11:23:40 2016 +0700
summary: imm: Compile the IMM library using the C++ compiler [#2142]

[staging:c94117]
changeset:   8433:c9411767b601
user:Hung Nguyen 
date:Thu Dec 08 10:51:46 2016 +0700
summary: imm: Fix "crosses initialization" errors [#2142]

[staging:4b9cd9]
changeset:   8434:4b9cd9530600
user:Hung Nguyen 
date:Thu Dec 08 10:57:16 2016 +0700
summary: imm: Fix "invalid conversion" errors. [#2142]

[staging:b0317b]
changeset:   8435:b0317ba353eb
user:Hung Nguyen 
date:Thu Dec 08 10:57:39 2016 +0700
summary: imm: Fix "comparison between signed and unsigned integer" errors 
[#2142]

[staging:b358f6]
changeset:   8436:b358f65db262
user:Hung Nguyen 
date:Thu Dec 08 10:59:06 2016 +0700
summary: imm: Fix linkage errors [#2142]




---

** [tickets:#2142] imm: Compile the IMM library using the C++ compiler**

**Status:** fixed
**Milestone:** 5.2.FC
**Created:** Thu Oct 27, 2016 04:31 AM UTC by Hung Nguyen
**Last Updated:** Thu Nov 03, 2016 11:02 AM UTC
**Owner:** Hung Nguyen


Compile the IMM library using the C++ compiler and fix all errors that C++ 
compiler complains.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2219 ntfd: circular dependency with osafntfimcnd

2016-12-12 Thread Vu Minh Nguyen
Hi Praveen,

Below is my understanding regarding your concerns:
> We need gaurantee that thread will surely terminated IMCN

IMCN component after getting unblocked, i think, it will go to 
`handle_sigterm_event()` for SIGTERM signal handling and exit gracefully.

Other thing is, when IMCN is blocking at saImmOiDispatch(), if any IMM 
configuration changes, all CCB callbacks to the blocking applier may get 
TIME_OUT, I guess. I see, IMCN sets `IMMA_SYNCR_TIMEOUT` IMM sync timeout to 01 
second.

> One more thread will get spawned for termination if doing another switchver 

This concern is valid. We should have one flag to prevent this happens.

/Vu



---

** [tickets:#2219] ntfd: circular dependency with osafntfimcnd**

**Status:** assigned
**Milestone:** 5.0.2
**Created:** Thu Dec 08, 2016 05:14 AM UTC by Gary Lee
**Last Updated:** Mon Dec 12, 2016 06:38 AM UTC
**Owner:** Praveen


A circular dependency can be seen when performing a si-swap of 
safSi=SC-2N,safApp=OpenSAF:

1. Active NTFD is trying to sync with Standby using MBC
2. Standby NTFD is the process of terminating its local osafntfimcnd. It is 
stuck in timedwait_imcn_exit() and cannot reply to the Active.
3. osafntfimcnd [on standby] is trying to send a notfication to Active NTFD

So we have (1) depending on (2) depending on (3) depending on (1)

This results in a temporary deadlock that dramatically slows down NTFD's 
ability to process its main dispatch loop. The deadlock only lasts for approx. 
1 second, when mbcsv_mds_send_msg() times out. But since there could be lots of 
MBC messages to send, sometimes osafntfimcnd is killed with SIGABRT generating 
a coredump. The si-swap operation will also timeout.

steps to reproduce
- Run loop of ntftest 32
root@SC-1:~# for i in {1..10}; do ntftest 32; done
- On another terminal, keep swapping 2N Opensaf SI, got coredump after couples 
of swaps
root@SC-1:~# amf-adm si-swap safSi=SC-2N,safApp=OpenSAF
...
root@SC-1:~# amf-adm si-swap safSi=SC-2N,safApp=OpenSAF

~~~
SC-2 (active)

There are a lot of send failures. Each taking approx. 1 second to timeout. 
During these 1 second timeouts, NTFD cannot process the main dispatch loop.

Dec  7 11:01:37.531772 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:37.531781 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:38.537307 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:38.537758 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:38.537766 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:39.543180 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:39.543695 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:39.543698 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:40.545252 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:40.545719 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:40.545726 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:41.551328 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:41.551971 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:41.551979 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:42.557594 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:42.558171 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:42.558179 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:43.564051 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:43.564874 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:43.564883 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:44.572407 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:44.573262 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:44.573271 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:45.575091 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:47.083548 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e

~~~


~~~
SC-1 (standby)

NTFD is trying to terminate osafntfimcnd. While it is doing that, it cannot 
reply to NTFD on SC-2. Meanwhile, osafntfimcnd is sending NTF notifications to 
NTFD on SC-1.

Dec  7 11:01:35.453151 osafntfd [464:ntfs_imcnutil.c:0316] TR 
handle_state_ntfimcn: Terminating osafntfimcnd process
Dec  7 11:01:45.474313 osafntfd [464:ntfs_imcnutil.c:0124] TR   Termination 
timeout
Dec

[tickets] [opensaf:tickets] #2219 ntfd: circular dependency with osafntfimcnd

2016-12-12 Thread Praveen
Hi Vu,
That was expected behaviour that IMCN after getting unblocked will process 
sig_term event.  For this to happen, it has to come out of 
IMMDispatch(DISPATCH_ALL). I think IMMND just sends CCB callbacks for atleast 
create/delete/modify and complete on one go. For CCB Apply IMMND will have to 
wait for completed response which creates a dependency on IMCN. But I this not 
synschronous lilke callbacks in other services. So in our test application 
which is only one client making conf changes, no problem will occur. However in 
case of multiple clients are making CCB changes then IMMND will be sending 
these to IMCNand these will be piling up. 
 Anyways, I do not see much risk but still I would like Minh to clarify the 
problem enviornment whether configuration changes are done much faster and by 
multiple clients. I have seen in some IMM tickets where reporter was creating 
thousands of objects
Yes, by maintaining the flag spawning of mulitple threads can be avoided. But 
we need to remember here that maintaining this flag does not block a user to 
trigger swtichvers. So per switchover, restart of IMCN may get missed.

Thanks,
Praveen


---

** [tickets:#2219] ntfd: circular dependency with osafntfimcnd**

**Status:** assigned
**Milestone:** 5.0.2
**Created:** Thu Dec 08, 2016 05:14 AM UTC by Gary Lee
**Last Updated:** Mon Dec 12, 2016 08:47 AM UTC
**Owner:** Praveen


A circular dependency can be seen when performing a si-swap of 
safSi=SC-2N,safApp=OpenSAF:

1. Active NTFD is trying to sync with Standby using MBC
2. Standby NTFD is the process of terminating its local osafntfimcnd. It is 
stuck in timedwait_imcn_exit() and cannot reply to the Active.
3. osafntfimcnd [on standby] is trying to send a notfication to Active NTFD

So we have (1) depending on (2) depending on (3) depending on (1)

This results in a temporary deadlock that dramatically slows down NTFD's 
ability to process its main dispatch loop. The deadlock only lasts for approx. 
1 second, when mbcsv_mds_send_msg() times out. But since there could be lots of 
MBC messages to send, sometimes osafntfimcnd is killed with SIGABRT generating 
a coredump. The si-swap operation will also timeout.

steps to reproduce
- Run loop of ntftest 32
root@SC-1:~# for i in {1..10}; do ntftest 32; done
- On another terminal, keep swapping 2N Opensaf SI, got coredump after couples 
of swaps
root@SC-1:~# amf-adm si-swap safSi=SC-2N,safApp=OpenSAF
...
root@SC-1:~# amf-adm si-swap safSi=SC-2N,safApp=OpenSAF

~~~
SC-2 (active)

There are a lot of send failures. Each taking approx. 1 second to timeout. 
During these 1 second timeouts, NTFD cannot process the main dispatch loop.

Dec  7 11:01:37.531772 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:37.531781 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:38.537307 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:38.537758 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:38.537766 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:39.543180 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:39.543695 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:39.543698 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:40.545252 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:40.545719 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:40.545726 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:41.551328 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:41.551971 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:41.551979 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:42.557594 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:42.558171 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:42.558179 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:43.564051 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:43.564874 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:43.564883 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:44.572407 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:44.573262 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:44.573271 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:45.575091 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:47.083548 

[tickets] [opensaf:tickets] #2219 ntfd: circular dependency with osafntfimcnd

2016-12-12 Thread elunlen
Hi
The original specificatinon for IMCN said that it should be able to handle HA 
state changes. A first increment of IMCN without any HA handling except the 
possibility to inform the IMCN process if it shall run as active or standby 
when started. To handle a HA state change the IMCN process must be terminated 
and restarted given the correct state. 
In the "IMCN NTF notification protocol" a special notification was defined (See 
Error Report Notification in the OpenSAF Extensions PR document) to be sent if 
there is a risk that any IMM changes may have been missed. This notification is 
sent every time IMCN is started as active. If a client receives such a 
notification it has to read all relevant IMM data in order to synchronize 
itself. After that it can rely on notifications from IMCN until the next Error 
report notification is received.
The intention was to make IMCN able to change role without beeing restarted and 
without loosing track of IMM changes (ticket #157). However the first increment 
was considered good enough so #157 was never implemented.
This means that IMCN can be terminated at any time also if executing an IMM 
callback function. It does not matter if any IMM change is lost and never 
reported. Since IMCN is a process and it is the process that is terminated 
there is no problem with memory leeks or resource leeks. IMM will detect that 
the client is gone and will finalize all resources.
In NTF the IMCN surveillance thread will also detect that the IMCN process is 
gone and start a new one and give it the correct state (which may no longer be 
active)
This means that any solution for immediate termination is Ok e.g. to handle the 
termination in a separate thread to avoid the described deadlock.
NOTE:
SInce there is no HA handling implemented it is actually not needed to start 
any IMCN process on any other node than the active. That this is done is also 
remainings from the original intention.

MUST BE FIXED!
I saw that the init_ntfimcn() has been move from the main initialize() to 
initialize_for_assignment(). This is NOT OK since initialize_for_assignment() 
is called more than once meaning that also init_ntfimcn() is called more than 
once. The init_ntfimcn() function starts the IMCN surveillance thread and that 
cannot be done more than once!
Other init functions that are called from initialize_for_assignment() has an 
"is_initialized flag" in order to prevent initialization more than once. This 
could be a solution to implement for init_ntfimcn() as well.

Thanks
Lennart


---

** [tickets:#2219] ntfd: circular dependency with osafntfimcnd**

**Status:** assigned
**Milestone:** 5.0.2
**Created:** Thu Dec 08, 2016 05:14 AM UTC by Gary Lee
**Last Updated:** Mon Dec 12, 2016 09:33 AM UTC
**Owner:** Praveen


A circular dependency can be seen when performing a si-swap of 
safSi=SC-2N,safApp=OpenSAF:

1. Active NTFD is trying to sync with Standby using MBC
2. Standby NTFD is the process of terminating its local osafntfimcnd. It is 
stuck in timedwait_imcn_exit() and cannot reply to the Active.
3. osafntfimcnd [on standby] is trying to send a notfication to Active NTFD

So we have (1) depending on (2) depending on (3) depending on (1)

This results in a temporary deadlock that dramatically slows down NTFD's 
ability to process its main dispatch loop. The deadlock only lasts for approx. 
1 second, when mbcsv_mds_send_msg() times out. But since there could be lots of 
MBC messages to send, sometimes osafntfimcnd is killed with SIGABRT generating 
a coredump. The si-swap operation will also timeout.

steps to reproduce
- Run loop of ntftest 32
root@SC-1:~# for i in {1..10}; do ntftest 32; done
- On another terminal, keep swapping 2N Opensaf SI, got coredump after couples 
of swaps
root@SC-1:~# amf-adm si-swap safSi=SC-2N,safApp=OpenSAF
...
root@SC-1:~# amf-adm si-swap safSi=SC-2N,safApp=OpenSAF

~~~
SC-2 (active)

There are a lot of send failures. Each taking approx. 1 second to timeout. 
During these 1 second timeouts, NTFD cannot process the main dispatch loop.

Dec  7 11:01:37.531772 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:37.531781 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:38.537307 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:38.537758 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:38.537766 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:39.543180 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:39.543695 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:39.543698 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:40.545252 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:40.545719 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_

[tickets] [opensaf:tickets] #2219 ntfd: circular dependency with osafntfimcnd

2016-12-12 Thread Vu Minh Nguyen
Hi Lennart,

Seems  the global flag `fully_initialized` in control block is used to ensure 
that `initialize_for_assignment` is called once.

Another thing I would like to bring up here, that is TIMEOUT for all NTF API is 
set to 10 seconds (`#define NTFS_WAIT_TIME 1000`). Then, if calling to one of 
them e.g `saNtfNotificationSend()`, in worse case, the calling thread will be 
blocked up to 10 seconds and could lead to unexpected result as below syslog 
when perfoming si-swap:

> osafsmfd[9785]: NO Fail to invoke admin operation, rc=SA_AIS_ERR_TIMEOUT (5). 
> dn=[safSi=SC-2N,safApp=OpenSAF], opId=[7]
> osafsmfd[9785]: NO Admin op SA_AMF_ADMIN_SI_SWAP fail [rc = 5]

I am not sure if that setting time is ok or not.

/Vu



---

** [tickets:#2219] ntfd: circular dependency with osafntfimcnd**

**Status:** assigned
**Milestone:** 5.0.2
**Created:** Thu Dec 08, 2016 05:14 AM UTC by Gary Lee
**Last Updated:** Mon Dec 12, 2016 09:53 AM UTC
**Owner:** Praveen


A circular dependency can be seen when performing a si-swap of 
safSi=SC-2N,safApp=OpenSAF:

1. Active NTFD is trying to sync with Standby using MBC
2. Standby NTFD is the process of terminating its local osafntfimcnd. It is 
stuck in timedwait_imcn_exit() and cannot reply to the Active.
3. osafntfimcnd [on standby] is trying to send a notfication to Active NTFD

So we have (1) depending on (2) depending on (3) depending on (1)

This results in a temporary deadlock that dramatically slows down NTFD's 
ability to process its main dispatch loop. The deadlock only lasts for approx. 
1 second, when mbcsv_mds_send_msg() times out. But since there could be lots of 
MBC messages to send, sometimes osafntfimcnd is killed with SIGABRT generating 
a coredump. The si-swap operation will also timeout.

steps to reproduce
- Run loop of ntftest 32
root@SC-1:~# for i in {1..10}; do ntftest 32; done
- On another terminal, keep swapping 2N Opensaf SI, got coredump after couples 
of swaps
root@SC-1:~# amf-adm si-swap safSi=SC-2N,safApp=OpenSAF
...
root@SC-1:~# amf-adm si-swap safSi=SC-2N,safApp=OpenSAF

~~~
SC-2 (active)

There are a lot of send failures. Each taking approx. 1 second to timeout. 
During these 1 second timeouts, NTFD cannot process the main dispatch loop.

Dec  7 11:01:37.531772 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:37.531781 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:38.537307 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:38.537758 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:38.537766 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:39.543180 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:39.543695 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:39.543698 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:40.545252 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:40.545719 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:40.545726 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:41.551328 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:41.551971 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:41.551979 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:42.557594 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:42.558171 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:42.558179 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:43.564051 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:43.564874 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:43.564883 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:44.572407 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:44.573262 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:44.573271 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:45.575091 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:47.083548 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e

~~~


~~~
SC-1 (standby)

NTFD is trying to terminate osafntfimcnd. While it is doing that, it cannot 
reply to NTFD on SC-2. Meanwhile, osafntfimcnd is sending NTF notifications to 
NTFD on SC-1.

Dec  7 11:01:35.453151 osafntfd [464:ntfs_imcnutil.c:0316] TR 
handle_state_ntfimcn: Terminating osafntfimcnd process
Dec  7 11:01:45.4743

[tickets] [opensaf:tickets] #2221 build: OpenSAF fails to build outside the source tree

2016-12-12 Thread Zoran Milinkovic
- **status**: review --> fixed
- **Comment**:

default(5.2) :

changeset:   8437:1a479e952a20
tag: tip
user:Zoran Milinkovic 
date:Thu Dec 08 16:42:57 2016 +0100
summary: build: add top build directory to include path [#2221]




---

** [tickets:#2221] build: OpenSAF fails to build outside the source tree**

**Status:** fixed
**Milestone:** 5.2.FC
**Created:** Thu Dec 08, 2016 03:29 PM UTC by Zoran Milinkovic
**Last Updated:** Thu Dec 08, 2016 03:52 PM UTC
**Owner:** Zoran Milinkovic


~~~
build for SLES12 with OpenSAF 5.2 CS8417 outside the source tree fails

make[9]: Leaving directory 
`/home/user/git/build/opensaf/osaf/libs/core/cplusplus/base/tests'
make[9]: Entering directory 
`/home/user/git/build/opensaf/osaf/libs/core/cplusplus/base'
  CXX  libbase_la-file_notify.lo
  CXX  libbase_la-log_message.lo
  CXX  libbase_la-getenv.lo
  CXX  libbase_la-process.lo
  CXX  libbase_la-mutex.lo
  CXX  libbase_la-condition_variable.lo
  CXX  libbase_la-unix_client_socket.lo
  CXX  libbase_la-unix_server_socket.lo
  CXX  libbase_la-unix_socket.lo
../../../../../../../coremw/opensaf/osaf/libs/core/cplusplus/base/mutex.cc:19:22:
 fatal error: ./config.h: No such file 
or directory
 #include "./config.h"
  ^
compilation terminated.
make[9]: *** [libbase_la-mutex.lo] Error 1
make[9]: *** Waiting for unfinished jobs
make[9]: Leaving directory 
`/home/user/git/build/opensaf/osaf/libs/core/cplusplus/base'
~~~

After ./configure, config.h is created in the build directory.
The root of build directory needs to be added to the include path list.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2226 smf: failed one-step upgrade between opensaf versions

2016-12-12 Thread Rafael



---

** [tickets:#2226] smf: failed one-step upgrade between opensaf versions**

**Status:** unassigned
**Milestone:** 5.1.1
**Created:** Mon Dec 12, 2016 03:00 PM UTC by Rafael
**Last Updated:** Mon Dec 12, 2016 03:00 PM UTC
**Owner:** Rafael


When upgrading with one-step mode from a version before SmfExecControlHdl.cc 
was introduced to a version using it upgrade fails. Reason for the failure is 
that SMF tries to read "procExecMode" from "openSafSmfExecControl=SmfHdlCopy" 
but this object is only created during init in a version using 
SmfExecControlHdl.cc.

Solution is to read in the original value of procExecMode if the copy is 
missing. Drawback of this solution is that changing procExecMode during a 
running campaign but after init phase will cause errors.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2134 AMF: Update RTA saAmfSISUHAState to IMM

2016-12-12 Thread Minh Hon Chau
Hi Praveen,

Although part 2 #1725 is pushed, there is a chance that both 
avd_sg_su_si_mod_snd() and  avd_susi_mod_send() are updating this attribute 
differently. To avoid backward incompatibility in future, I would like to 
document that reading saAmfSISUState while assignment is ongoing not 
recommended/supported. User should only read this attribute if admin 
operation/failover completes. Do you agree?

Thanks,
Minh


---

** [tickets:#2134] AMF: Update RTA saAmfSISUHAState to IMM**

**Status:** wontfix
**Milestone:** never
**Created:** Thu Oct 20, 2016 07:58 PM UTC by Minh Hon Chau
**Last Updated:** Thu Nov 24, 2016 05:25 AM UTC
**Owner:** nobody


In scenario of 2N Si-swap, when AMFD sends QUIESCED su_si assignment msg (for 
example) to AMFND that changes the HA State of SUSI assignment, AMFD updates 
its local state AVD_SU_SI_REL::state, checkpoint this change to standby AMFD. 
However, AMFD does not updates saAmfSISUHAState untill receiving su_si 
assignment response. Question:
(1). Whether AMFD should update the runtime attribute saAmfSISUHAState to IMM 
as long as local @state gets updated in implementer; to make IMM, active AMFD, 
standby AMFD all are synced
(2). Or AMFD updates saAmfSISUHAState to IMM only if AMFD receives su_si 
assignment from AMFND, as it has been implemented currently for some reason 
(not expose the change of saAmfSISUHAState to user too early?)

grep "avd_susi_update" which updates saAmfSISUHAState to IMM, there is also an 
inconsistency in usage. For avd_susi_mod_send() sends su_si msg and also 
updates saAmfSISUHAState immediately, while avd_sg_su_si_mod_snd does 
otherwise. 

Since the headless recovery relies on IMM to restore the state. If 
saAmfSISUHAState is not updated punctually and the node is reboot during 
headless stage, so after headless saAmfSISUHAState read from IMM does not fit 
with many other states (SG fsm, SUSI fsm, saAmfSISUHAState of the other SUSIs).

My question is if doing (1) will cause any problem for normal cluster? Pending 
patches #1725 part 2 currently implement (1).



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2223 NTFD: Increment iterator out of range NotificationMap

2016-12-12 Thread Minh Hon Chau
- **status**: assigned --> review



---

** [tickets:#2223] NTFD: Increment iterator out of range NotificationMap**

**Status:** review
**Milestone:** 5.0.2
**Created:** Mon Dec 12, 2016 12:37 AM UTC by Minh Hon Chau
**Last Updated:** Mon Dec 12, 2016 12:37 AM UTC
**Owner:** Minh Hon Chau


In NtfAdmin::checkNotificationList(), when deleteConfirmedNotification() 
deletes the last item, the NotificationMap is reduced, posNot++ will be out of 
range of NotificationMap, not equal NotificationMap.end()

Got ntfd coredump while switch over

ntfd coredump bt

~~~
(gdb) bt full
#0  NtfNotification::getNextSubscription (this=0x978930, subId=...) at 
NtfNotification.cc:165
No locals.
#1  0x0041658d in NtfAdmin::checkNotificationList (this=0x974240) at 
NtfAdmin.cc:661
uSubId = {clientId = 1277, subscriptionId = 1}
notification = {> = {
_M_ptr = 0x978930, _M_refcount = {_M_pi = 0x978a40}}, }
__FUNCTION__ = "checkNotificationList"
posNot = {_M_node = 0x978770}
#2  0x0040407e in amf_csi_set_callback (invocation=4290772996, 
compName=, 
new_haState=, csiDescriptor=...) at ntfs_amf.c:255
error = 
prev_haState = SA_AMF_HA_STANDBY
role_change = 
rc = 
__FUNCTION__ = "amf_csi_set_callback"
#3  0x7ff32687e48e in ava_hdl_cbk_rec_prc (info=0x7ff32000b880, 
reg_cbk=reg_cbk@entry=0x7ffc7437fd00) at ava_hdl.cc:655
csi_set = 0x7ff32000b898
actv_or_stdby = 
rc = 1
i = 
__FUNCTION__ = "ava_hdl_cbk_rec_prc"
#4  0x7ff32687f749 in ava_hdl_cbk_dispatch_all (hdl_rec=0x7ffc7437fda8, 
cb=0x7ffc7437fda0)
at ava_hdl.cc:446
list_resp = 0x962698
reg_cbk = {saAmfHealthcheckCallback = 0x403ee0 
, 
  saAmfComponentTerminateCallback = 0x403f00 
, 
  saAmfCSISetCallback = 0x403fd0 , 
  saAmfCSIRemoveCallback = 0x403e70 , 
---Type  to continue, or q  to quit---
  saAmfProtectionGroupTrackCallback = 0x0, 
saAmfProtectionGroupTrackCallback_4 = 0x0, 
  saAmfProxiedComponentInstantiateCallback = 0x0, 
  saAmfProxiedComponentCleanupCallback = 0x0, 
  saAmfContainedComponentInstantiateCallback = 0x0, 
  saAmfContainedComponentCleanupCallback = 0x0, 
osafCsiAttributeChangeCallback = 0x0}
rec = 0x7ff32000c2a0
hdl = 4288675841
rc = 1
#5  ava_hdl_cbk_dispatch (cb=0x7ffc7437fda0, hdl_rec=0x7ffc7437fda8, 
flags=)
at ava_hdl.cc:320
rc = 1
__FUNCTION__ = "ava_hdl_cbk_dispatch"
#6  0x7ff3268783ea in AmfAgent::Dispatch (hdl=4288675841, 
flags=flags@entry=SA_DISPATCH_ALL)
at amf_agent.cc:283
hdl_rec = 0x962610
pend_dis = 0
__FUNCTION__ = "Dispatch"
cb = 0x961b20
rc = SA_AIS_OK
pend_fin = 0
#7  0x7ff326878465 in saAmfDispatch (hdl=, 
flags=flags@entry=SA_DISPATCH_ALL)
at amf_agent.cc:244
No locals.
#8  0x004037e4 in main (argc=, argv=) at 
ntfs_main.c:357
mbx_fd = 
error = 
rc = 
fds = {{fd = 20, events = 1, revents = 0}, {fd = 16, events = 1, 
revents = 1}, {fd = 22, 
events = 1, revents = 1}, {fd = 14, events = 1, revents = 0}, {fd = 
24, events = 1, 
---Type  to continue, or q  to quit---
revents = 0}, {fd = 18, events = 1, revents = 0}}
term_fd = 20
__FUNCTION__ = "main"
(gdb) 

~~~

ntfd trace:
~~~
Dec  9 22:39:28.984523 osafntfd [453:ntfs_amf.c:0175] >> amf_csi_set_callback 
Dec  9 22:39:28.984527 osafntfd [453:ntfs_main.c:0271] >> 
initialize_for_assignment: ha_state = 1
Dec  9 22:39:28.984531 osafntfd [453:ntfs_main.c:0292] << 
initialize_for_assignment: rc = 1
Dec  9 22:39:28.984535 osafntfd [453:ntfs_amf.c:0043] >> 
amf_active_state_handler: HA ACTIVE request
Dec  9 22:39:28.984539 osafntfd [453:ntfs_amf.c:0047] << 
amf_active_state_handler 
Dec  9 22:39:28.984556 osafntfd [453:ntfs_mbcsv.c:0180] >> 
ntfs_mbcsv_change_HA_state 
Dec  9 22:39:28.984560 osafntfd [453:mbcsv_api.c:0662] >> 
mbcsv_process_chg_role_request: Change HA role for the checkpoint
Dec  9 22:39:28.984564 osafntfd [453:mbcsv_api.c:0685] TR svc_id:44, 
pwe_hdl:65550
Dec  9 22:39:28.984568 osafntfd [453:mbcsv_api.c:0743] << 
mbcsv_process_chg_role_request: retval: 1
Dec  9 22:39:28.984571 osafntfd [453:ntfs_mbcsv.c:0194] << 
ntfs_mbcsv_change_HA_state 
Dec  9 22:39:28.984575 osafntfd [453:ntfs_amf.c:0248] TR amf_csi_set_callback 
NTFS changing HA role from SA_AMF_HA_STANDBY to SA_AMF_HA_ACTIVE
Dec  9 22:39:28.984578 osafntfd [453:amf_agent.cc:1987] >> Response: 
SaAmfHandleT passed is ffa1
Dec  9 22:39:28.984582 osafntfd [453:ava_hdl.cc:0890] >> ava_hdl_pend_resp_get 
Dec  9 22:39:28.984585 osafntfd [453:ava_hdl.cc:0906] << ava_hdl_pend_resp_get 
Dec  9 22:39:28.984589 osafntfd [453:ava_mds.cc:0342] >> ava_mds_send 
Dec  9 22:39:28.984601 osafntfd [453:ava_mds.cc:0928] >> ava_mds_msg_async_send 
Dec  9 22:39:28.984606 osafntfd [453:ava_mds.cc:0181] >> ava_mds_cbk 
D

[tickets] [opensaf:tickets] #2219 ntfd: circular dependency with osafntfimcnd

2016-12-12 Thread Minh Hon Chau
Hi,

I think both patches can solve the problem. V1 (Praveen's) removes the 
dependency in ntfimcnd, V3 (Vu's) removes the dependency in ntfd. Below is a 
summary I have collected from previous discussions:
- V3 (Vu's)
. There could be chances that we have no active ntfimcnd:  the period of no 
active imcn happens when both termination threads are terminating imcnd at the 
same time (there should be one doing earlier), and the active imcnd is stuck in 
ntfInitialize()
. The period of 2 active ntfimcnd could be longer: In state transition from 
quiesced->standby, termination thread is ongoing, the old active imcnd could be 
a bit longer as active one
I think the difficulty in V3 is ntfd can not know things happening in both 
local and remote ntfimcnd

- V1 (Praveen's)
. When ntfimcnd gets sigterm, the job from dispatching IMM callback to sending 
notification could be interrupted. If we make sigterm_handler sync with IMM's 
thread, that could cause signal abort from ntfd due to latency
. But what if loss of notification is always assumpted, and we have special 
notification to inform clients at ntfimcnd's start up
. IMM thread needs to do synchronization with sigterm_handler on common stuffs: 
ie immOI handle, ...

Thanks
/Minh


---

** [tickets:#2219] ntfd: circular dependency with osafntfimcnd**

**Status:** assigned
**Milestone:** 5.0.2
**Created:** Thu Dec 08, 2016 05:14 AM UTC by Gary Lee
**Last Updated:** Mon Dec 12, 2016 10:26 AM UTC
**Owner:** Praveen


A circular dependency can be seen when performing a si-swap of 
safSi=SC-2N,safApp=OpenSAF:

1. Active NTFD is trying to sync with Standby using MBC
2. Standby NTFD is the process of terminating its local osafntfimcnd. It is 
stuck in timedwait_imcn_exit() and cannot reply to the Active.
3. osafntfimcnd [on standby] is trying to send a notfication to Active NTFD

So we have (1) depending on (2) depending on (3) depending on (1)

This results in a temporary deadlock that dramatically slows down NTFD's 
ability to process its main dispatch loop. The deadlock only lasts for approx. 
1 second, when mbcsv_mds_send_msg() times out. But since there could be lots of 
MBC messages to send, sometimes osafntfimcnd is killed with SIGABRT generating 
a coredump. The si-swap operation will also timeout.

steps to reproduce
- Run loop of ntftest 32
root@SC-1:~# for i in {1..10}; do ntftest 32; done
- On another terminal, keep swapping 2N Opensaf SI, got coredump after couples 
of swaps
root@SC-1:~# amf-adm si-swap safSi=SC-2N,safApp=OpenSAF
...
root@SC-1:~# amf-adm si-swap safSi=SC-2N,safApp=OpenSAF

~~~
SC-2 (active)

There are a lot of send failures. Each taking approx. 1 second to timeout. 
During these 1 second timeouts, NTFD cannot process the main dispatch loop.

Dec  7 11:01:37.531772 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:37.531781 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:38.537307 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:38.537758 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:38.537766 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:39.543180 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:39.543695 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:39.543698 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:40.545252 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:40.545719 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:40.545726 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:41.551328 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:41.551971 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:41.551979 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:42.557594 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:42.558171 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:42.558179 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:43.564051 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:43.564874 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:43.564883 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:44.572407 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:44.573262 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:44.573271 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:45.575091 os

[tickets] [opensaf:tickets] Re: #2219 ntfd: circular dependency with osafntfimcnd

2016-12-12 Thread Minh Hon Chau
Hi Vu,

For the NtfSend API(), 10 secs can count on saflogging at server side, writing 
to disk, ... so I guess we should not change for NtfSend(). For the other 
APIs() we could see how many critical jobs the server get involved, then we can 
decide to reduce this timeout, perhaps for thos API() doing finalize jobs

thanks,
Minh


---

** [tickets:#2219] ntfd: circular dependency with osafntfimcnd**

**Status:** assigned
**Milestone:** 5.0.2
**Created:** Thu Dec 08, 2016 05:14 AM UTC by Gary Lee
**Last Updated:** Tue Dec 13, 2016 04:40 AM UTC
**Owner:** Praveen


A circular dependency can be seen when performing a si-swap of 
safSi=SC-2N,safApp=OpenSAF:

1. Active NTFD is trying to sync with Standby using MBC
2. Standby NTFD is the process of terminating its local osafntfimcnd. It is 
stuck in timedwait_imcn_exit() and cannot reply to the Active.
3. osafntfimcnd [on standby] is trying to send a notfication to Active NTFD

So we have (1) depending on (2) depending on (3) depending on (1)

This results in a temporary deadlock that dramatically slows down NTFD's 
ability to process its main dispatch loop. The deadlock only lasts for approx. 
1 second, when mbcsv_mds_send_msg() times out. But since there could be lots of 
MBC messages to send, sometimes osafntfimcnd is killed with SIGABRT generating 
a coredump. The si-swap operation will also timeout.

steps to reproduce
- Run loop of ntftest 32
root@SC-1:~# for i in {1..10}; do ntftest 32; done
- On another terminal, keep swapping 2N Opensaf SI, got coredump after couples 
of swaps
root@SC-1:~# amf-adm si-swap safSi=SC-2N,safApp=OpenSAF
...
root@SC-1:~# amf-adm si-swap safSi=SC-2N,safApp=OpenSAF

~~~
SC-2 (active)

There are a lot of send failures. Each taking approx. 1 second to timeout. 
During these 1 second timeouts, NTFD cannot process the main dispatch loop.

Dec  7 11:01:37.531772 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:37.531781 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:38.537307 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:38.537758 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:38.537766 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:39.543180 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:39.543695 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:39.543698 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:40.545252 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:40.545719 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:40.545726 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:41.551328 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:41.551971 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:41.551979 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:42.557594 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:42.558171 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:42.558179 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:43.564051 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:43.564874 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:43.564883 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:44.572407 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:44.573262 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
Dec  7 11:01:44.573271 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
Dec  7 11:01:45.575091 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
Dec  7 11:01:47.083548 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e

~~~


~~~
SC-1 (standby)

NTFD is trying to terminate osafntfimcnd. While it is doing that, it cannot 
reply to NTFD on SC-2. Meanwhile, osafntfimcnd is sending NTF notifications to 
NTFD on SC-1.

Dec  7 11:01:35.453151 osafntfd [464:ntfs_imcnutil.c:0316] TR 
handle_state_ntfimcn: Terminating osafntfimcnd process
Dec  7 11:01:45.474313 osafntfd [464:ntfs_imcnutil.c:0124] TR   Termination 
timeout
Dec  7 11:01:45.474375 osafntfd [464:ntfs_imcnutil.c:0130] << 
wait_imcnproc_termination: rc = -1, retry_cnt = 101
Dec  7 11:01:45.474387 osafntfd [464:ntfs_imcnutil.c:0168] TR   Normal 
termination failed. Escalate to abort
Dec  7 11:01:45.574703 osafntfd [464:ntfs_imcnutil.c:0172] TR   Imcn 
successfully aborted
D

[tickets] [opensaf:tickets] #2227 smf:ONE-STEP upgrade failed due to duplicated entities in comp and SU

2016-12-12 Thread Neelakanta Reddy



---

** [tickets:#2227] smf:ONE-STEP upgrade failed due to duplicated entities in 
comp and SU**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Tue Dec 13, 2016 06:16 AM UTC by Neelakanta Reddy
**Last Updated:** Tue Dec 13, 2016 06:16 AM UTC
**Owner:** nobody


This Ticket is extension of #2209.
In #2209 if a campaign contains both rolling and singlestep.
The singlestep contains duplicated node forAddremove AU/DU present in the 
rolling upgrad also.

This ticket is related to duplcated entities SU and comp present in 
forAddRemove and rolling procedure.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2134 AMF: Update RTA saAmfSISUHAState to IMM

2016-12-12 Thread Praveen
Hi Minh,
I think we cannot use "not supported". Instead we should use the word not 
recommended. Suppose SG is in transition state because of assignments going on, 
if a user makes some query at this time then user will surely get some value of 
attribute but this value will be changing with each query that is made during 
these assignments. But when admin operation or any escalation finishes then 
value will remain same in each query. 

Thanks,
Praveen


---

** [tickets:#2134] AMF: Update RTA saAmfSISUHAState to IMM**

**Status:** wontfix
**Milestone:** never
**Created:** Thu Oct 20, 2016 07:58 PM UTC by Minh Hon Chau
**Last Updated:** Mon Dec 12, 2016 11:35 PM UTC
**Owner:** nobody


In scenario of 2N Si-swap, when AMFD sends QUIESCED su_si assignment msg (for 
example) to AMFND that changes the HA State of SUSI assignment, AMFD updates 
its local state AVD_SU_SI_REL::state, checkpoint this change to standby AMFD. 
However, AMFD does not updates saAmfSISUHAState untill receiving su_si 
assignment response. Question:
(1). Whether AMFD should update the runtime attribute saAmfSISUHAState to IMM 
as long as local @state gets updated in implementer; to make IMM, active AMFD, 
standby AMFD all are synced
(2). Or AMFD updates saAmfSISUHAState to IMM only if AMFD receives su_si 
assignment from AMFND, as it has been implemented currently for some reason 
(not expose the change of saAmfSISUHAState to user too early?)

grep "avd_susi_update" which updates saAmfSISUHAState to IMM, there is also an 
inconsistency in usage. For avd_susi_mod_send() sends su_si msg and also 
updates saAmfSISUHAState immediately, while avd_sg_su_si_mod_snd does 
otherwise. 

Since the headless recovery relies on IMM to restore the state. If 
saAmfSISUHAState is not updated punctually and the node is reboot during 
headless stage, so after headless saAmfSISUHAState read from IMM does not fit 
with many other states (SG fsm, SUSI fsm, saAmfSISUHAState of the other SUSIs).

My question is if doing (1) will cause any problem for normal cluster? Pending 
patches #1725 part 2 currently implement (1).



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets