[tickets] [opensaf:tickets] #2037 IMM: Immd asserted on active controller in backward compatability

2016-09-14 Thread Madhurika Koppula
logs are attcached 


Attachments:

- 
[immd_assert.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/98733abe/7f1c/attachment/immd_assert.tgz)
 (18.7 MB; application/octet-stream)


---

** [tickets:#2037] IMM: Immd asserted on active controller in backward 
compatability**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Thu Sep 15, 2016 06:51 AM UTC by Madhurika Koppula
**Last Updated:** Thu Sep 15, 2016 06:51 AM UTC
**Owner:** nobody
**Attachments:**

- 
[immnd_immd_cores.rtf](https://sourceforge.net/p/opensaf/tickets/2037/attachment/immnd_immd_cores.rtf)
 (8.5 kB; application/rtf)


**Environment Details:**

OS : Suse 64bit
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
1PBE enabled ).

Backward Compatability:
Opensaf versions on nodes:
SC-1 (5.0), SC-2 (5.1 FC), PL-3 (5.0), PL-4(5.1FC).

**Summary:** IMMD asserted on active controller after immnd crash.

**Steps followed & Observed behaviour:**

1) SC-1 is with role standby, SC-2 is with role active.
2) Sequence of api's called as below. 
a) saImmOiInitialize() 
b) saImmOiImplementerSet() 
c) kill -9 `pidof osafimmnd` d) saImmOiRtObjectDelete() 
 e) saImmOiFinalize()

Observations:

1) First immnd asserted on active controller  when calling 
immnd_evt_proc_fevs_rcv
2) Second active controller rebooted with immd assertion failed.

Below is the snippet of active controller SC-2:

Sep 20 20:52:47 SCALE_SLOT-42 osafntfimcnd[15091]: NO saImmOiDispatch() Fail 
SA_AIS_ERR_BAD_HANDLE (9)
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: Started
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: NO Persistent Back-End 
capability configured, Pbe file:imm.db (suffix may get added)
Sep 20 20:52:47 SCALE_SLOT-42 osafimmd[12159]: NO MDS event from svc_id 25 
(change:3, dest:565216648273948)
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: NO IMMD service is UP ... 
ScAbsenseAllowed?:0 introduced?:0
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: NO SERVER STATE: 
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: immnd_evt.c:9146: 
immnd_evt_proc_fevs_rcv: Assertion '!reply_dest || (reply_dest == 
cb->immnd_mdest_id) || isObjSync' failed.
Sep 20 20:52:47 SCALE_SLOT-42 python2.5: WA imma_mds_svc_evt: 
mds_auth_server_connect failed
Sep 20 20:52:47 SCALE_SLOT-42 osafimmd[12159]: NO MDS event from svc_id 25 
(change:4, dest:565216648273948)
Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: NO Restarting a component of 
'safSu=SC-2,safSg=NoRed,safApp=OpenSAF' (comp restart count: 6)
Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: NO 
'safComp=IMMND,safSu=SC-2,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' 
: Recovery

Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15136]: WA Error code 2 returned for 
message type 82 - ignoring
Sep 20 20:52:47 SCALE_SLOT-42 osafimmd[12159]: NO Extended intro from node 2020f
Sep 20 20:52:47 SCALE_SLOT-42 osafimmd[12159]: immd_evt.c:816: 
immd_accept_node: Assertion 'node_info->immnd_key != cb->node_id' failed.
Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: NO 
'safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'

Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: ER 
safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast

Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: Rebooting OpenSAF NodeId = 
131599 EE Name = , Reason: Component faulted: recovery is node failfast, 
OwnNodeId = 131599, SupervisionTime = 60

After reboot timestamp is as below:

Sep 20 20:52:47 SCALE_SLOT-42 opensaf_reboot: Rebooting local node; timeout=60
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15136]: WA DISCARD DUPLICATE FEVS 
message:12996
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15136]: WA Error code 2 returned for 
message type 82 - ignoring
Sep 20 20:52:48 SCALE_SLOT-42 osafimmnd[15136]: NO SERVER STATE: 
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
Sep 20 20:52:48 SCALE_SLOT-42 osafimmnd[15136]: NO SERVER STATE: 
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
Sep 22 02:13:05 SCALE_SLOT-42 syslog-ng[1133]: syslog-ng starting up; 
version='2.0.9'
Sep 22 02:13:06 SCALE_SLOT-42 sm-notify[1650]: Version 1.2.3 starting
Sep 22 02:13:06 SCALE_SLOT-42 sm-notify[1650]: Backgrounding to notify hosts...
Sep 22 02:13:06 SCALE_SLOT-42 sm-notify[1651]: Running as root.  chown 
/var/lib/nfs to choose different user
Sep 22 02:13:06 SCALE_SLOT-42 sm-notify[1651]: DNS resolution of CONN-PC 
failed; retrying later
Sep 22 02:13:06 SCALE_SLOT-42 rpc.statd[1662]: Version 1.2.3 starting
Sep 22 02:13:06 SCALE_SLOT-42 rpc.statd[1662]: Flags: TI-RPC
Sep 22 02:13:06 SCALE_SLOT-42 rpc.statd[1662]: Running as root.  chown 
/var/lib/nfs to choose different user
Sep 22 02:13:07 SCALE_SLOT-42 opensafd: Starting OpenSAF Services(5.1.FC - ) 
(Using TIPC)
Sep 22 02:13:07 SCALE_SLOT-42 osafclmna[1745]: Started
Sep 22 02:13:07 SCALE_SLOT-42 osafclmna[1745]:

[tickets] [opensaf:tickets] #2037 IMM: Immd asserted on active controller in backward compatability

2016-09-14 Thread Madhurika Koppula



---

** [tickets:#2037] IMM: Immd asserted on active controller in backward 
compatability**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Thu Sep 15, 2016 06:51 AM UTC by Madhurika Koppula
**Last Updated:** Thu Sep 15, 2016 06:51 AM UTC
**Owner:** nobody
**Attachments:**

- 
[immnd_immd_cores.rtf](https://sourceforge.net/p/opensaf/tickets/2037/attachment/immnd_immd_cores.rtf)
 (8.5 kB; application/rtf)


**Environment Details:**

OS : Suse 64bit
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
1PBE enabled ).

Backward Compatability:
Opensaf versions on nodes:
SC-1 (5.0), SC-2 (5.1 FC), PL-3 (5.0), PL-4(5.1FC).

**Summary:** IMMD asserted on active controller after immnd crash.

**Steps followed & Observed behaviour:**

1) SC-1 is with role standby, SC-2 is with role active.
2) Sequence of api's called as below. 
a) saImmOiInitialize() 
b) saImmOiImplementerSet() 
c) kill -9 `pidof osafimmnd` d) saImmOiRtObjectDelete() 
 e) saImmOiFinalize()

Observations:

1) First immnd asserted on active controller  when calling 
immnd_evt_proc_fevs_rcv
2) Second active controller rebooted with immd assertion failed.

Below is the snippet of active controller SC-2:

Sep 20 20:52:47 SCALE_SLOT-42 osafntfimcnd[15091]: NO saImmOiDispatch() Fail 
SA_AIS_ERR_BAD_HANDLE (9)
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: Started
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: NO Persistent Back-End 
capability configured, Pbe file:imm.db (suffix may get added)
Sep 20 20:52:47 SCALE_SLOT-42 osafimmd[12159]: NO MDS event from svc_id 25 
(change:3, dest:565216648273948)
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: NO IMMD service is UP ... 
ScAbsenseAllowed?:0 introduced?:0
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: NO SERVER STATE: 
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15114]: immnd_evt.c:9146: 
immnd_evt_proc_fevs_rcv: Assertion '!reply_dest || (reply_dest == 
cb->immnd_mdest_id) || isObjSync' failed.
Sep 20 20:52:47 SCALE_SLOT-42 python2.5: WA imma_mds_svc_evt: 
mds_auth_server_connect failed
Sep 20 20:52:47 SCALE_SLOT-42 osafimmd[12159]: NO MDS event from svc_id 25 
(change:4, dest:565216648273948)
Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: NO Restarting a component of 
'safSu=SC-2,safSg=NoRed,safApp=OpenSAF' (comp restart count: 6)
Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: NO 
'safComp=IMMND,safSu=SC-2,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' 
: Recovery

Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15136]: WA Error code 2 returned for 
message type 82 - ignoring
Sep 20 20:52:47 SCALE_SLOT-42 osafimmd[12159]: NO Extended intro from node 2020f
Sep 20 20:52:47 SCALE_SLOT-42 osafimmd[12159]: immd_evt.c:816: 
immd_accept_node: Assertion 'node_info->immnd_key != cb->node_id' failed.
Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: NO 
'safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'

Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: ER 
safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast

Sep 20 20:52:47 SCALE_SLOT-42 osafamfnd[12239]: Rebooting OpenSAF NodeId = 
131599 EE Name = , Reason: Component faulted: recovery is node failfast, 
OwnNodeId = 131599, SupervisionTime = 60

After reboot timestamp is as below:

Sep 20 20:52:47 SCALE_SLOT-42 opensaf_reboot: Rebooting local node; timeout=60
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15136]: WA DISCARD DUPLICATE FEVS 
message:12996
Sep 20 20:52:47 SCALE_SLOT-42 osafimmnd[15136]: WA Error code 2 returned for 
message type 82 - ignoring
Sep 20 20:52:48 SCALE_SLOT-42 osafimmnd[15136]: NO SERVER STATE: 
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
Sep 20 20:52:48 SCALE_SLOT-42 osafimmnd[15136]: NO SERVER STATE: 
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
Sep 22 02:13:05 SCALE_SLOT-42 syslog-ng[1133]: syslog-ng starting up; 
version='2.0.9'
Sep 22 02:13:06 SCALE_SLOT-42 sm-notify[1650]: Version 1.2.3 starting
Sep 22 02:13:06 SCALE_SLOT-42 sm-notify[1650]: Backgrounding to notify hosts...
Sep 22 02:13:06 SCALE_SLOT-42 sm-notify[1651]: Running as root.  chown 
/var/lib/nfs to choose different user
Sep 22 02:13:06 SCALE_SLOT-42 sm-notify[1651]: DNS resolution of CONN-PC 
failed; retrying later
Sep 22 02:13:06 SCALE_SLOT-42 rpc.statd[1662]: Version 1.2.3 starting
Sep 22 02:13:06 SCALE_SLOT-42 rpc.statd[1662]: Flags: TI-RPC
Sep 22 02:13:06 SCALE_SLOT-42 rpc.statd[1662]: Running as root.  chown 
/var/lib/nfs to choose different user
Sep 22 02:13:07 SCALE_SLOT-42 opensafd: Starting OpenSAF Services(5.1.FC - ) 
(Using TIPC)
Sep 22 02:13:07 SCALE_SLOT-42 osafclmna[1745]: Started
Sep 22 02:13:07 SCALE_SLOT-42 osafclmna[1745]: NO 
safNode=SC-2,safCluster=myClmCluster Joined cluster, nodeid=2020f
Sep 22 02:13:07 SCALE_SLOT-42 osafrded[1754]: Started
Sep 22 02:13:08 SCALE_SLOT-42 osaffmd[1763]: Started


Below is the 

[tickets] [opensaf:tickets] #2031 imm:README files are missing when opensaf is downloaded

2016-09-14 Thread Neelakanta Reddy
- **status**: review --> fixed
- **Comment**:

changeset:   8074:8af3073a5dad
branch:  opensaf-5.0.x
parent:  8071:ef0846e8e5c9
user:Neelakanta Reddy
date:Thu Sep 15 11:57:48 2016 +0530
summary: imm : updated Makefile to reflect all IMM README files [#2031]

changeset:   8075:b93e039d2cb2
branch:  opensaf-5.1.x
parent:  8072:c472ada0394c
user:Neelakanta Reddy
date:Thu Sep 15 11:57:48 2016 +0530
summary: imm : updated Makefile to reflect all IMM README files [#2031]

changeset:   8076:03aa9b77c634
tag: tip
parent:  8073:b7ba90304dce
user:Neelakanta Reddy
date:Thu Sep 15 11:57:48 2016 +0530
summary: imm : updated Makefile to reflect all IMM README files [#2031]




---

** [tickets:#2031] imm:README files are missing when opensaf is downloaded**

**Status:** fixed
**Milestone:** 5.0.1
**Created:** Wed Sep 14, 2016 02:40 AM UTC by Neelakanta Reddy
**Last Updated:** Wed Sep 14, 2016 08:59 AM UTC
**Owner:** Neelakanta Reddy


Update the Makefile.am with all README files


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1816 IMM: saImmOiAugmentCcbInitialize returned ERR_TRY_AGAIN when ERR_LIBRARY was expected

2016-09-14 Thread Neelakanta Reddy
- **status**: review --> fixed
- **Comment**:

changeset:   8073:b7ba90304dce
tag: tip
parent:  8069:b30d5e33e50c
user:Neelakanta Reddy
date:Thu Sep 15 11:45:40 2016 +0530
summary: imm: return the correct error code for ERR_LIBRARY in 
saImmOiAugmentCcbInitialize [#1816]

changeset:   8072:c472ada0394c
branch:  opensaf-5.1.x
parent:  8068:87a09d9164d3
user:Neelakanta Reddy
date:Thu Sep 15 11:45:40 2016 +0530
summary: imm: return the correct error code for ERR_LIBRARY in 
saImmOiAugmentCcbInitialize [#1816]

changeset:   8071:ef0846e8e5c9
branch:  opensaf-5.0.x
parent:  8067:efeaffca9483
user:Neelakanta Reddy
date:Thu Sep 15 11:45:40 2016 +0530
summary: imm: return the correct error code for ERR_LIBRARY in 
saImmOiAugmentCcbInitialize [#1816]

changeset:   8070:af5ecf3d1a72
branch:  opensaf-4.7.x
parent:  8066:afddc603adcb
user:Neelakanta Reddy
date:Thu Sep 15 11:45:40 2016 +0530
summary: imm: return the correct error code for ERR_LIBRARY in 
saImmOiAugmentCcbInitialize [#1816]




---

** [tickets:#1816] IMM: saImmOiAugmentCcbInitialize returned ERR_TRY_AGAIN when 
ERR_LIBRARY was expected**

**Status:** fixed
**Milestone:** 4.7.2
**Created:** Mon May 09, 2016 07:27 AM UTC by Chani Srivastava
**Last Updated:** Tue Sep 13, 2016 11:47 AM UTC
**Owner:** Neelakanta Reddy


This was found as part of validating ticket #1808

Code snippet: imma_oi_api.c:3749

~~~
 if(immsv_om_handle_initialize) {/*This is always the first immsv_om_ call */
rc = immsv_om_handle_initialize(&privateOmHandle, 
&version);
} else {
TRACE("ERR_LIBRARY: Error in library linkage. 
libSaImmOm.so is not linked");
rc = SA_AIS_ERR_LIBRARY;
}

if(rc != SA_AIS_OK) {
TRACE("ERR_TRY_AGAIN: failed to obtain internal om 
handle rc:%u", rc);
rc = SA_AIS_ERR_TRY_AGAIN;
goto lock_fail; /* We are not locked and nothing to 
de-allocate.  */
}
~~~

When rc is set to SA_AIS_ERR_LIBRARY, there is no goto and hence next if 
condition is executed which sets rc SA_AIS_ERR_TRY_AGAIN



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2036 build : make rpm fails, if installation directories are specified

2016-09-14 Thread Srikanth R



---

** [tickets:#2036] build : make rpm fails, if installation directories are 
specified**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Thu Sep 15, 2016 06:03 AM UTC by Srikanth R
**Last Updated:** Thu Sep 15, 2016 06:03 AM UTC
**Owner:** nobody


Environment : 
Setup : SLES 64bit gcc 6.1

Steps performed :

Ran the following commands after downloading the opensaf from hg.
-> ./bootstrap.sh
-> ./configure CFLAGS="-g " CXXFLAGS="-g " --enable-tipc --enable-imm-pbe 
--enable-ntf-imcn   --sysconfdir=/opt/etc  --localstatedir=/opt/var 
--libdir=/opt/usr/lib
-> make rpm

 The last step fails with the following error.
 
 
 Checking for unpackaged file(s): /usr/lib/rpm/check-files 
/home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root
error: Installed (but unpackaged) file(s) found:
   /opt/etc/opensaf/amfd.conf
   /opt/etc/opensaf/amfnd.conf
   /opt/etc/opensaf/amfwdog.conf
   /opt/etc/opensaf/chassis_id
   /opt/etc/opensaf/ckptd.conf
   /opt/etc/opensaf/ckptnd.conf
   /opt/etc/opensaf/clmd.conf
   /opt/etc/opensaf/clmna.conf
   /opt/etc/opensaf/dtmd.conf


RPM build errors:
File not found: 
/home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root/usr/lib64/opensaf
File not found: 
/home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root/etc/opensaf
File not found: 
/home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root/var/lib/opensaf
File not found: 
/home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root/var/log/opensaf
File not found: 
/home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root/var/run/opensaf
File not found: 
/home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root/etc/opensaf/chassis_id
File not found: 
/home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root/etc/opensaf/slot_id
.
File not found by glob: 
/home/pinv/Srikanth/SAF_7997/rpms/tmp/opensaf-5.1.FC-1-root-root/usr/lib64/libSa*.a
Installed (but unpackaged) file(s) found:
   /opt/etc/opensaf/amfd.conf
   /opt/etc/opensaf/amfnd.conf
   /opt/etc/opensaf/amfwdog.conf
   /opt/etc/opensaf/chassis_id
   /opt/etc/opensaf/ckptd.conf
   /opt/etc/opensaf/ckptnd.conf
   /opt/etc/opensaf/clmd.conf
   /opt/etc/opensaf/clmna.conf
   /opt/etc/opensaf/dtmd.conf
  ...
 /opt/usr/lib/pkgconfig/opensaf-smf.pc
   /opt/usr/lib/pkgconfig/opensaf.pc
make: *** [rpm] Error 1






---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


Re: [tickets] [opensaf:tickets] #2014 Rebooted controller not detected in TCP

2016-09-14 Thread A V Mahesh

Hi Jonas,

Ok , I just pushed , please test once on 4.7 :



branch:  opensaf-4.7.x
parent:  8043:4a8a00097561
user:A V Mahesh 
date:Thu Sep 15 10:50:31 2016 +0530
summary: dtm: TCP Improve node failFast with TCP_USER_TIMEOUT [#2014]



-AVM

On 9/15/2016 12:08 AM, Jonas Arndt wrote:


Mahesh,

Can we get this back-ported to 4.7.x as well?

Cheers,

// Jonas



*[tickets:#2014]  
Rebooted controller not detected in TCP*


*Status:* review
*Milestone:* 5.0.1
*Created:* Thu Sep 08, 2016 06:20 PM UTC by Jonas Arndt
*Last Updated:* Wed Sep 14, 2016 04:51 AM UTC
*Owner:* A V Mahesh (AVM)
*Attachments:*

  * logs.tgz

(84.1 kB; application/x-compressed-tar)
  * tcp_user_timeout_2014.patch


(5.5 kB; application/octet-stream)

OS environment:

Debian Jessie (OpenSAF is running on bare metal, no containers or VMs)
4.4.7 kernel
Network eth0, bonded, OVS (I have tried all of them and the problem is there in 
all configurations)

In 20% of the cases a "reboot -f" on controller2 is not detected and 
acted on. What is in the mds.log is .


Sep 7 6:44:23.918566 osafamfd[41365] ERR |MDS_SND_RCV: 
Adest=<0x,1>
Sep 7 6:44:23.918595 osafamfd[41365] ERR |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep 7 6:44:34.018662 osafamfd[41365] ERR |MDS_SND_RCV: Timeout or 
Error occured
Sep 7 6:44:34.018751 osafamfd[41365] ERR |MDS_SND_RCV: Timeout occured 
on red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep 7 6:44:34.018789 osafamfd[41365] ERR |MDS_SND_RCV: 
Adest=<0x,1>
Sep 7 6:44:34.018818 osafamfd[41365] ERR |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep 7 6:44:44.118832 osafamfd[41365] ERR |MDS_SND_RCV: Timeout or 
Error occured
Sep 7 6:44:44.118919 osafamfd[41365] ERR |MDS_SND_RCV: Timeout occured 
on red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep 7 6:44:44.118955 osafamfd[41365] ERR |MDS_SND_RCV: 
Adest=<0x,1>
Sep 7 6:44:44.118984 osafamfd[41365] ERR |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep 7 6:44:54.218987 osafamfd[41365] ERR |MDS_SND_RCV: Timeout or 
Error occured
Sep 7 6:44:54.219085 osafamfd[41365] ERR |MDS_SND_RCV: Timeout occured 
on red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep 7 6:44:54.219139 osafamfd[41365] ERR |MDS_SND_RCV: 
Adest=<0x,1>
Sep 7 6:44:54.219168 osafamfd[41365] ERR |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>


Still, there is nothing in the syslog indicating that controller2 has 
left the cluster. This is for TCP.
When the node comes back on line (without opensaf being started) 
controller 1 notice finally and fail over apps.


When the reboot is not detected the tcp keep alives stops and goes 
into retransmits instead. I have attached 2 tshark sessions captured 
from controller1, capturing traffic between controller1 and 
controller2. The failed reboot detect is captured in 
"ctrl2_failed_detection.trc" and for a working detection there is a 
file "ctrl2_working.trc" I have also attached all logs in 
/var/log/opensaf and the syslog (all from controller one).


It appears to me that we are hitting something similar like 
"http://stackoverflow.com/questions/33553410/tcp-retranmission-timer-overrides-kills-tcp-keepalive-timer-delaying-disconnect";


// Jonas



Sent from sourceforge.net because 
opensaf-tickets@lists.sourceforge.net is subscribed to 
https://sourceforge.net/p/opensaf/tickets/


To unsubscribe from further messages, a project admin can change 
settings at https://sourceforge.net/p/opensaf/admin/tickets/options. 
Or, if this is a mailing list, you can unsubscribe from the mailing list.




--


___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2014 Rebooted controller not detected in TCP

2016-09-14 Thread A V Mahesh (AVM)
- **status**: review --> fixed
- **Milestone**: 5.0.1 --> 4.7.2
- **Comment**:

changeset:   8066:afddc603adcb
branch:  opensaf-4.7.x
parent:  8043:4a8a00097561
user:A V Mahesh 
date:Thu Sep 15 10:50:31 2016 +0530
summary: dtm: TCP Improve node failFast with TCP_USER_TIMEOUT [#2014]
 
changeset:   8067:efeaffca9483
branch:  opensaf-5.0.x
parent:  8049:28129451fd38
user:A V Mahesh 
date:Thu Sep 15 10:52:03 2016 +0530
summary: dtm: TCP Improve node failFast with TCP_USER_TIMEOUT [#2014]
 
changeset:   8068:87a09d9164d3
branch:  opensaf-5.1.x
parent:  8065:019e617955ef
user:A V Mahesh 
date:Thu Sep 15 10:52:32 2016 +0530
summary: dtm: TCP Improve node failFast with TCP_USER_TIMEOUT [#2014]
 
changeset:   8069:b30d5e33e50c
tag: tip
parent:  8064:99410ba8cc21
user:A V Mahesh 
date:Thu Sep 15 10:52:49 2016 +0530
summary: dtm: TCP Improve node failFast with TCP_USER_TIMEOUT [#2014]



---

** [tickets:#2014] Rebooted controller not detected in TCP**

**Status:** fixed
**Milestone:** 4.7.2
**Created:** Thu Sep 08, 2016 06:20 PM UTC by Jonas Arndt
**Last Updated:** Wed Sep 14, 2016 06:38 PM UTC
**Owner:** A V Mahesh (AVM)
**Attachments:**

- 
[logs.tgz](https://sourceforge.net/p/opensaf/tickets/2014/attachment/logs.tgz) 
(84.1 kB; application/x-compressed-tar)
- 
[tcp_user_timeout_2014.patch](https://sourceforge.net/p/opensaf/tickets/2014/attachment/tcp_user_timeout_2014.patch)
 (5.5 kB; application/octet-stream)


OS environment:

Debian Jessie (OpenSAF is running on bare metal, no containers or VMs)
4.4.7 kernel
Network eth0, bonded, OVS (I have tried all of them and the problem is 
there in all configurations)


In 20% of the cases a "reboot -f" on  controller2 is not detected and acted on. 
What is in the mds.log is .

Sep  7  6:44:23.918566 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:23.918595 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:34.018662 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:34.018751 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:34.018789 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:34.018818 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:44.118832 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:44.118919 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:44.118955 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:44.118984 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:54.218987 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:54.219085 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:54.219139 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:54.219168 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>

Still, there is nothing in the syslog indicating that controller2 has left the 
cluster. This is for TCP.
When the node comes back on line (without opensaf being started) controller 1 
notice finally and fail over apps. 

When the reboot is not detected the tcp keep alives stops and goes into 
retransmits instead. I have attached 2 tshark sessions captured from 
controller1, capturing traffic between controller1 and controller2. The failed 
reboot detect is captured in "ctrl2_failed_detection.trc" and for a working 
detection there is a file "ctrl2_working.trc" I have also attached all logs in 
/var/log/opensaf and the syslog (all from controller one).

It appears to me that we are hitting something similar like 
"http://stackoverflow.com/questions/33553410/tcp-retranmission-timer-overrides-kills-tcp-keepalive-timer-delaying-disconnect";

// Jonas


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2022 AMF : amfd asserted for NG lock operation ( quiesced timeout - Nway model))

2016-09-14 Thread Srikanth R
Attaching the logs. 


Attachments:

- 
[2022.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/2d0d5691/727c/attachment/2022.tgz)
 (1.1 MB; application/x-compressed-tar)


---

** [tickets:#2022] AMF : amfd asserted for NG lock operation ( quiesced timeout 
- Nway model))**

**Status:** assigned
**Milestone:** 4.7.2
**Created:** Sat Sep 10, 2016 09:58 AM UTC by Srikanth R
**Last Updated:** Mon Sep 12, 2016 07:21 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[createAppTestApp.sh](https://sourceforge.net/p/opensaf/tickets/2022/attachment/createAppTestApp.sh)
 (15.8 kB; text/x-shellscript)


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature enabled & 
no PBE )
AMF Application : NPM model with SUs mapped on SC-2,PL-3,PL-4


Summary :
--
AMFD on both controllers asserted, if Nway application failed in CSI SET 
QUIESCED callback in lock operation of node group 


Steps followed & Observed behaviour
--

-> Hosted nway application on PL-3,PL-4 and SC-2 and brought up the 
application. Configuration is attached to the ticket.
-> Created a node group with all the three nodes.
-> Ensured that one of component will not respond to quiesced callback
-> Now performed the lock operation on the node group
-> amfd on both controllers asserted with the following back trace.


0  0x7f66fbc6fb55 in raise () from /lib64/libc.so.6
1  0x7f66fbc71131 in abort () from /lib64/libc.so.6
2  0x7f66fda6816a in __osafassert_fail (__file=0x51214d "su.cc", 
__line=2022, __func=0x513aa0 "dec_curr_stdby_si", __assertion=0x51355f 
"saAmfSUNumCurrStandbySIs > 0") at sysf_def.c:281

3  0x004d68cd in AVD_SU::dec_curr_stdby_si (this=0x7ccf40) at su.cc:2022
4  0x004be804 in avd_susi_update_assignment_counters (susi=0x78c670, 
action=AVSV_SUSI_ACT_DEL, current_ha_state=0, new_ha_state=0) at siass.cc:783
5  0x004be59b in avd_susi_del_send (susi=0x78c670) at siass.cc:714
6  0x004af12e in avd_sg_nway_node_fail_stable (cb=0x751b80, 
su=0x800470, susi=0x0) at sg_nway_fsm.cc:3022
7  0x004b025d in avd_sg_nway_node_fail_sg_realign (cb=0x751b80, 
su=0x800470) at sg_nway_fsm.cc:3493
8  0x004a8042 in SG_NWAY::node_fail (this=0x797c50, cb=0x751b80, 
su=0x800470) at sg_nway_fsm.cc:497
9  0x004b209e in sg_su_failover_func (su=0x800470) at sgproc.cc:525
10 0x004b2d16 in avd_su_oper_state_evh (cb=0x751b80, 
evt=0x7f66f4002940) at sgproc.cc:838
11 0x00450ba9 in process_event (cb_now=0x751b80, evt=0x7f66f4002940) at 
main.cc:768
12 0x004508cd in main_loop () at main.cc:689
13 0x00450e43 in main (argc=2, argv=0x7fff0f81ab18) at main.cc:841







---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2023 AMF : Long DN RT objects creation failed with ERR_TOO_LONG (13)

2016-09-14 Thread Srikanth R
If IMM has maximum limit of 2048  for the longDN object, then AMF should reject 
the creation of application objects by calculating the size of the rt objects.


---

** [tickets:#2023] AMF : Long DN RT objects creation failed with ERR_TOO_LONG 
(13)**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Sat Sep 10, 2016 10:57 AM UTC by Srikanth R
**Last Updated:** Tue Sep 13, 2016 01:01 AM UTC
**Owner:** nobody
**Attachments:**

- 
[2023.tgz](https://sourceforge.net/p/opensaf/tickets/2023/attachment/2023.tgz) 
(159.7 kB; application/x-compressed-tar)


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature disabled & 
no PBE  & longDn feature enabled )
AMF Application : 2N model with SUs mapped on PL-3,PL-4


Summary :
--
 Long DN RT objects creation failed with ERR_TOO_LONG during unlock operation 
of SU.


Steps followed & Observed behaviour
--

-> Initially enabled the longDn feature.

-> Later imported the attached AMF configuration successfully.

-> Now performed unlock-in and unlock operation of SU, for which following 
error is observed in syslog.

Sep 10 16:11:43 CONTROLLER-2 osafamfnd[4279]: NO Assigned 
'safSi=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz,safApp=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz'
 ACTIVE to 'safSu=SU1,safSg=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopq
 
rstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz,safApp=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz'
Sep 10 16:11:43 CONTROLLER-2 osafamfd[4265]: ER exec: create FAILED 13
Sep 10 16:11:46 CONTROLLER-2 osafamfd[4265]:** ER exec: create FAILED 13**


Below is the corresponding trace in osafamfd :


Sep 10 16:11:46.647681 osafamfd [4265:imm.cc:0396] >> execute
Sep 10 16:11:46.647730 osafamfd [4265:imm.cc:0142] >> exec: Create 
safCsi=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz_CSIA,safSi=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz,safApp=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxy
 zabcdefghijklmnopqrstuvT
Sep 10 16:11:46.647783 osafamfd [4265:imma_oi_api.c:2786] >> 
rt_object_create_common
Sep 10 16:11:46.647879 osafamfd [4265:imma_oi_api.c:2892] TR attr:safCSIComp
Sep 10 16:11:46.647908 osafamfd [4265:imma_oi_api.c:2892] TR 
attr:saAmfCSICompHAState
Sep 10 16:11:46.647927 osafamfd [4265:imma_oi_api.c:2892] TR 
attr:saAmfCSICompHAReadinessState
Sep 10 16:11:46.649108 osafamfd [4265:imma_oi_api.c:3063] << 
rt_object_create_common
Sep 10 16:11:46.649157 osafamfd [4265:imm.cc:0163] ER exec: create FAILED 13




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a proje

[tickets] [opensaf:tickets] #2035 osaf: Convert transport monitor script to a daemon

2016-09-14 Thread A V Mahesh (AVM)
- **summary**: dtm: Convert transport monitor script to a daemon --> osaf: 
Convert transport monitor script to a daemon
- **Comment**:

Not specific to TCP or TIPC



---

** [tickets:#2035] osaf: Convert transport monitor script to a daemon**

**Status:** accepted
**Milestone:** 5.2.FC
**Created:** Wed Sep 14, 2016 10:43 AM UTC by Anders Widell
**Last Updated:** Wed Sep 14, 2016 10:43 AM UTC
**Owner:** Anders Widell


As a first step in implementing ticket [#2015], convert the 
osaf-transport-monitor shell script to a daemon.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1895 ntf: ER in syslog: ER NtfAdmin::subscriptionRemoved client 12 not found

2016-09-14 Thread Canh Truong
- Description has changed:

Diff:



--- old
+++ new
@@ -1,12 +1,19 @@
 Failed when run test suit 2: 
 
-osafntfd [463:ntfs_evt.c:0338] >> proc_unsubscribe_msg: client_id 28, 
subscriptionId 111
-osafntfd [463:NtfAdmin.cc:0553] ER NtfAdmin::subscriptionRemoved client 28 not 
found
-osafntfd [463:ntfs_evt.c:0341] << proc_unsubscribe_msg 
+Sep 15 11:13:41.191010 osafntfd [476:mds_dt_trans.c:0608] >> 
mdtm_process_poll_recv_data_tcp 
+Sep 15 11:13:41.191034 osafntfd [476:mbcsv_mds.c:0244] << mbcsv_mds_send_msg: 
success
+Sep 15 11:13:41.191038 osafntfd [476:mbcsv_util.c:0492] << 
mbcsv_send_ckpt_data_to_all_peers 
+Sep 15 11:13:41.191041 osafntfd [476:mbcsv_api.c:0868] << 
mbcsv_process_snd_ckpt_request: retval: 1
+Sep 15 11:13:41.191044 osafntfd [476:ntfs_mbcsv.c:1365] << 
ntfs_send_async_update 
+Sep 15 11:13:41.191047 osafntfd [476:ntfs_com.c:0093] << 
client_removed_res_lib 
+Sep 15 11:13:41.191049 osafntfd [476:ntfs_evt.c:0304] << proc_finalize_msg 
+Sep 15 11:13:41.191055 osafntfd [476:ntfs_evt.c:0357] >> proc_unsubscribe_msg: 
client_id 8, subscriptionId 111
+Sep 15 11:13:41.191140 osafntfd [476:NtfAdmin.cc:0520] ER 
NtfAdmin::subscriptionRemoved client 8 not found
+Sep 15 11:13:41.191146 osafntfd [476:ntfs_evt.c:0369] << proc_unsubscribe_msg 
 
-Currently, when finalizing the last client, ntfa uninstall MDS connection.
-This causes that the NCSMDS_DOWN event will be sent to ntfs. ntfs will remove 
all clients that relates to this MDS.
-But if we initializes new client immediately after finalizing, ntfs may 
reviece the message of initialization before message of NCSMDS_DOWN event. This 
cause new client will be removed without finalizing and then action subcribe 
failed.
+The issue here is when test case 4 of test suit2 is executed, the unsubcribe 
and Finalize is runing in parallel. Ntfd receive API request from Finalize 
before unsubcribe, so client and all of relating to its will be removed before 
ntfd process request from ubsubcribe. 
+The error is printed out in this case. And unsubcribe action is failed. But in 
the test case the return code is re-assigned to ok for test case passed.
 
+The step in test case to call API in ntf may be wrong.
 
 Similiar ticket: https://sourceforge.net/p/opensaf/tickets/1818/



- **status**: review --> accepted
- Attachments has changed:

Diff:



--- old
+++ new
@@ -0,0 +1 @@
+1895.tgz (471.5 kB; application/x-compressed)






---

** [tickets:#1895] ntf: ER in syslog: ER NtfAdmin::subscriptionRemoved client 
12 not found**

**Status:** accepted
**Milestone:** 4.7.2
**Created:** Fri Jun 24, 2016 03:24 AM UTC by Vo Minh Hoang
**Last Updated:** Wed Sep 14, 2016 03:07 AM UTC
**Owner:** Canh Truong
**Attachments:**

- 
[1895.tgz](https://sourceforge.net/p/opensaf/tickets/1895/attachment/1895.tgz) 
(471.5 kB; application/x-compressed)


Failed when run test suit 2: 

Sep 15 11:13:41.191010 osafntfd [476:mds_dt_trans.c:0608] >> 
mdtm_process_poll_recv_data_tcp 
Sep 15 11:13:41.191034 osafntfd [476:mbcsv_mds.c:0244] << mbcsv_mds_send_msg: 
success
Sep 15 11:13:41.191038 osafntfd [476:mbcsv_util.c:0492] << 
mbcsv_send_ckpt_data_to_all_peers 
Sep 15 11:13:41.191041 osafntfd [476:mbcsv_api.c:0868] << 
mbcsv_process_snd_ckpt_request: retval: 1
Sep 15 11:13:41.191044 osafntfd [476:ntfs_mbcsv.c:1365] << 
ntfs_send_async_update 
Sep 15 11:13:41.191047 osafntfd [476:ntfs_com.c:0093] << client_removed_res_lib 
Sep 15 11:13:41.191049 osafntfd [476:ntfs_evt.c:0304] << proc_finalize_msg 
Sep 15 11:13:41.191055 osafntfd [476:ntfs_evt.c:0357] >> proc_unsubscribe_msg: 
client_id 8, subscriptionId 111
Sep 15 11:13:41.191140 osafntfd [476:NtfAdmin.cc:0520] ER 
NtfAdmin::subscriptionRemoved client 8 not found
Sep 15 11:13:41.191146 osafntfd [476:ntfs_evt.c:0369] << proc_unsubscribe_msg 

The issue here is when test case 4 of test suit2 is executed, the unsubcribe 
and Finalize is runing in parallel. Ntfd receive API request from Finalize 
before unsubcribe, so client and all of relating to its will be removed before 
ntfd process request from ubsubcribe. 
The error is printed out in this case. And unsubcribe action is failed. But in 
the test case the return code is re-assigned to ok for test case passed.

The step in test case to call API in ntf may be wrong.

Similiar ticket: https://sourceforge.net/p/opensaf/tickets/1818/


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1895 ntf: ER in syslog: ER NtfAdmin::subscriptionRemoved client 12 not found

2016-09-14 Thread Canh Truong
- Attachments has changed:

Diff:



--- old
+++ new
@@ -1 +0,0 @@
-osafntfd.txt (705.5 kB; text/plain)






---

** [tickets:#1895] ntf: ER in syslog: ER NtfAdmin::subscriptionRemoved client 
12 not found**

**Status:** review
**Milestone:** 4.7.2
**Created:** Fri Jun 24, 2016 03:24 AM UTC by Vo Minh Hoang
**Last Updated:** Wed Sep 14, 2016 03:07 AM UTC
**Owner:** Canh Truong


Failed when run test suit 2: 

osafntfd [463:ntfs_evt.c:0338] >> proc_unsubscribe_msg: client_id 28, 
subscriptionId 111
osafntfd [463:NtfAdmin.cc:0553] ER NtfAdmin::subscriptionRemoved client 28 not 
found
osafntfd [463:ntfs_evt.c:0341] << proc_unsubscribe_msg 

Currently, when finalizing the last client, ntfa uninstall MDS connection.
This causes that the NCSMDS_DOWN event will be sent to ntfs. ntfs will remove 
all clients that relates to this MDS.
But if we initializes new client immediately after finalizing, ntfs may reviece 
the message of initialization before message of NCSMDS_DOWN event. This cause 
new client will be removed without finalizing and then action subcribe failed.


Similiar ticket: https://sourceforge.net/p/opensaf/tickets/1818/


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2014 Rebooted controller not detected in TCP

2016-09-14 Thread Jonas Arndt
Mahesh,

Can we get this back-ported to 4.7.x as well?

Cheers,

// Jonas


---

** [tickets:#2014] Rebooted controller not detected in TCP**

**Status:** review
**Milestone:** 5.0.1
**Created:** Thu Sep 08, 2016 06:20 PM UTC by Jonas Arndt
**Last Updated:** Wed Sep 14, 2016 04:51 AM UTC
**Owner:** A V Mahesh (AVM)
**Attachments:**

- 
[logs.tgz](https://sourceforge.net/p/opensaf/tickets/2014/attachment/logs.tgz) 
(84.1 kB; application/x-compressed-tar)
- 
[tcp_user_timeout_2014.patch](https://sourceforge.net/p/opensaf/tickets/2014/attachment/tcp_user_timeout_2014.patch)
 (5.5 kB; application/octet-stream)


OS environment:

Debian Jessie (OpenSAF is running on bare metal, no containers or VMs)
4.4.7 kernel
Network eth0, bonded, OVS (I have tried all of them and the problem is 
there in all configurations)


In 20% of the cases a "reboot -f" on  controller2 is not detected and acted on. 
What is in the mds.log is .

Sep  7  6:44:23.918566 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:23.918595 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:34.018662 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:34.018751 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:34.018789 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:34.018818 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:44.118832 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:44.118919 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:44.118955 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:44.118984 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:54.218987 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:54.219085 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:54.219139 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:54.219168 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>

Still, there is nothing in the syslog indicating that controller2 has left the 
cluster. This is for TCP.
When the node comes back on line (without opensaf being started) controller 1 
notice finally and fail over apps. 

When the reboot is not detected the tcp keep alives stops and goes into 
retransmits instead. I have attached 2 tshark sessions captured from 
controller1, capturing traffic between controller1 and controller2. The failed 
reboot detect is captured in "ctrl2_failed_detection.trc" and for a working 
detection there is a file "ctrl2_working.trc" I have also attached all logs in 
/var/log/opensaf and the syslog (all from controller one).

It appears to me that we are hitting something similar like 
"http://stackoverflow.com/questions/33553410/tcp-retranmission-timer-overrides-kills-tcp-keepalive-timer-delaying-disconnect";

// Jonas


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2001 IMM: Owner handle is getting corrupt when OmAdminOperationInvoke retruns ERR_TIMEOUT

2016-09-14 Thread Neelakanta Reddy
- **status**: assigned --> wontfix



---

** [tickets:#2001] IMM: Owner handle is getting corrupt when 
OmAdminOperationInvoke retruns ERR_TIMEOUT**

**Status:** wontfix
**Milestone:** 5.1.RC2
**Created:** Tue Sep 06, 2016 07:14 AM UTC by Chani Srivastava
**Last Updated:** Wed Sep 14, 2016 12:33 PM UTC
**Owner:** Neelakanta Reddy
**Attachments:**

- 
[AdminCbkTmOut.zip](https://sourceforge.net/p/opensaf/tickets/2001/attachment/AdminCbkTmOut.zip)
 (95.1 kB; application/zip)


OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 4 nodes 1 PBE enabled

Summary:
Steps to Reproduce
1. Invoke saImmOmAdminOperationInvokeAsync_2() while waiting in callback with 
time more that OI_CALLBACK_TIMEOUT value
2. Invoke saImmOmAdminOperationInvokeAsync_2() again and do not wait OR Invoke 
any Ccb operation

Observed Bahavior:
Step1 will return SA_AIS_ERR_TIMEOUT (Expected)
Step2 is returning SA_AIS_ERR_BAD_HANDLE (SA_AIS_OK is expected)

Sep  6 12:22:27 SLOT1 python2.5: logtrace: trace enabled to file 
/tmp/imma_oi_callbacktimeout.trace, mask=0x
Sep  6 12:22:27 SLOT1 python2.5: NO IMMA library TRACE initialize done pid:1147 
svid:26 file:/tmp/imma_oi_callbacktimeout.trace
Sep  6 12:22:27 SLOT1 osafimmnd[838]: NO Implementer connected: 14 
(testOiTmout_verifyAdminOpCallback_37) <343, 2010f>
Sep  6 12:22:42 SLOT1 osafimmnd[838]: WA IMMND - Client went down so no response
Sep  6 12:22:42 SLOT1 osafimmnd[838]: WA MDS Send Failed to service:IMMND rc:2
Sep  6 12:22:42 SLOT1 osafimmnd[838]: ER Problem in sending to peer IMMND over 
MDS. Discarding admin op reply.
Sep  6 12:22:42 SLOT1 osafimmnd[838]: WA Error code 2 returned for message type 
21 - ignoring
Sep  6 12:22:47 SLOT1 osafimmnd[838]: WA IMMND - Client 1468878946575 went down 
on syncronous request, discarding request
Sep  6 12:22:47 SLOT1 osafimmnd[838]: NO Implementer locally disconnected. 
Marking it as doomed 14 <343, 2010f> (testOiTmout_verifyAdminOpCallback_37)
Sep  6 12:22:47 SLOT1 osafimmnd[838]: NO Implementer disconnected 14 <343, 
2010f> (testOiTmout_verifyAdminOpCallback_37)


Note: **Test passed in OpenSAF release 5.0**

Agent traces and immnd, immd traces attached


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2001 IMM: Owner handle is getting corrupt when OmAdminOperationInvoke retruns ERR_TIMEOUT

2016-09-14 Thread Neelakanta Reddy
The logs attached are for SyncAdminOperation but the problem description in the 
ticket is of AsyncAdminOperation

1. The First sync admin operation happened:

Sep 14 12:22:32.664368 imma [24880:imma_om_api.c:3675] >> admin_op_invoke_common
Sep 14 12:22:32.664403 imma [24880:imma_om_api.c:3823] TR immInvocations:0
Sep 14 12:22:32.664414 imma [24880:imma_om_api.c:3837] TR 
PARAM:testOiTmout_verifyAdminOpCallback_37
Sep 14 12:22:32.95 imma [24880:imma_proc.c:1363] TR  Event type:6
Sep 14 12:22:32.666738 imma [24880:imma_proc.c:1256] >> imma_proc_free_pointers
Sep 14 12:22:32.666754 imma [24880:imma_proc.c:1349] << imma_proc_free_pointers
Sep 14 12:22:32.666789 imma [24880:imma_db.c:0187] >> imma_oi_ccb_record_find
Sep 14 12:22:32.666809 imma [24880:imma_db.c:0198] << imma_oi_ccb_record_find
Sep 14 12:22:32.666821 imma [24880:imma_proc.c:1932] >> 
imma_process_callback_info
Sep 14 12:22:47.681771 imma [24880:imma_om_api.c:3886] TR Fevs send RETURNED:5
Sep 14 12:22:47.681849 imma [24880:imma_om_api.c:4031] << admin_op_invoke_common

2. The operation is timedout waiting for reply from implementer, which inturn 
called discard connection
Sep 14 12:22:45.941668 osafimmnd [24552:ImmModel.cc:14107] TR Checking active 
ccb 2 for deadlock or blocked implementer
Sep 14 12:22:45.941680 osafimmnd [24552:ImmModel.cc:14109] TR state:1 
waitsart:0.00 PberestartId:0
Sep 14 12:22:46.947346 osafimmnd [24552:ImmModel.cc:14062] T5 Did not timeout 
now - start < 16(14.282737)
Sep 14 12:22:46.947479 osafimmnd [24552:ImmModel.cc:14107] TR Checking active 
ccb 2 for deadlock or blocked implementer
Sep 14 12:22:46.947502 osafimmnd [24552:ImmModel.cc:14109] TR state:1 
waitsart:0.00 PberestartId:0
Sep 14 12:22:47.682750 osafimmnd [24552:immsv_evt.c:5422] T8 Received: 
IMMND_EVT_A2ND_CL_TIMEOUT (93) from 2010f
Sep 14 12:22:47.682797 osafimmnd [24552:immnd_evt.c:2114] >> 
immnd_evt_proc_cl_imma_timeout
Sep 14 12:22:47.682809 osafimmnd [24552:immnd_evt.c:2116] T2 timeout in imma 
library for handle: 21f0002010f
Sep 14 12:22:47.682826 osafimmnd [24552:ImmModel.cc:13928] >> purgeSyncRequest
Sep 14 12:22:47.682848 osafimmnd [24552:ImmModel.cc:13960] T5 Purged syncronous 
Admin-op continuation
Sep 14 12:22:47.682862 osafimmnd [24552:ImmModel.cc:14034] << purgeSyncRequest
Sep 14 12:22:47.682872 osafimmnd [24552:immnd_proc.c:0090] >> 
immnd_proc_imma_discard_connection
Sep 14 12:22:47.682884 osafimmnd [24552:immnd_proc.c:0094] T5 Attempting 
discard connection id:21f0002010f 

3. when second synchronous admin operation is called, then  BAD_HANDLE is 
returned by IMM, because the client connection is dicarded in IMM because of 
timeout.

Sep 14 12:22:52.693042 osafimmnd [24552:immsv_evt.c:5422] T8 Received: 
IMMND_EVT_A2ND_IMM_FEVS (14) from 2010f
Sep 14 12:22:52.693095 osafimmnd [24552:immnd_evt.c:2932] T2 sender_count: 
21474836482 size: 144
Sep 14 12:22:52.693156 osafimmnd [24552:immnd_evt.c:2940] WA IMMND - Client 
2332167373071 went down on syncronous request, discarding request
Sep 14 12:22:52.693171 osafimmnd [24552:immnd_evt.c:3178] T2 SENDRSP FAIL 9


Sep 14 12:22:52.692640 imma [24880:imma_om_api.c:3675] >> admin_op_invoke_common
Sep 14 12:22:52.692713 imma [24880:imma_om_api.c:3823] TR immInvocations:1
Sep 14 12:22:52.692727 imma [24880:imma_om_api.c:3837] TR 
PARAM:testOiTmout_verifyAdminOpCallback_37
Sep 14 12:22:52.694767 imma [24880:imma_om_api.c:3886] TR Fevs send RETURNED:1
Sep 14 12:22:52.695056 imma [24880:imma_om_api.c:3897] TR ERROR returned:9
Sep 14 12:22:52.695089 imma [24880:imma_om_api.c:4031] << admin_op_invoke_common


According to the shared logs(sync admin operation) is working according to the 
design


---

** [tickets:#2001] IMM: Owner handle is getting corrupt when 
OmAdminOperationInvoke retruns ERR_TIMEOUT**

**Status:** assigned
**Milestone:** 5.1.RC2
**Created:** Tue Sep 06, 2016 07:14 AM UTC by Chani Srivastava
**Last Updated:** Wed Sep 14, 2016 07:06 AM UTC
**Owner:** Neelakanta Reddy
**Attachments:**

- 
[AdminCbkTmOut.zip](https://sourceforge.net/p/opensaf/tickets/2001/attachment/AdminCbkTmOut.zip)
 (95.1 kB; application/zip)


OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 4 nodes 1 PBE enabled

Summary:
Steps to Reproduce
1. Invoke saImmOmAdminOperationInvokeAsync_2() while waiting in callback with 
time more that OI_CALLBACK_TIMEOUT value
2. Invoke saImmOmAdminOperationInvokeAsync_2() again and do not wait OR Invoke 
any Ccb operation

Observed Bahavior:
Step1 will return SA_AIS_ERR_TIMEOUT (Expected)
Step2 is returning SA_AIS_ERR_BAD_HANDLE (SA_AIS_OK is expected)

Sep  6 12:22:27 SLOT1 python2.5: logtrace: trace enabled to file 
/tmp/imma_oi_callbacktimeout.trace, mask=0x
Sep  6 12:22:27 SLOT1 python2.5: NO IMMA library TRACE initialize done pid:1147 
svid:26 file:/tmp/imma_oi_callbacktimeout.trace
Sep  6 12:22:27 SLOT1 osafimmnd[838]: NO Implementer connected: 14 
(testOiTmout_verifyAdminOpCallback_37) <343, 2010f>
Sep  6 12:22:42 SLOT1 osafimmnd[838]: WA IMMND - Client w

[tickets] [opensaf:tickets] #2035 dtm: Convert transport monitor script to a daemon

2016-09-14 Thread Anders Widell



---

** [tickets:#2035] dtm: Convert transport monitor script to a daemon**

**Status:** accepted
**Milestone:** 5.2.FC
**Created:** Wed Sep 14, 2016 10:43 AM UTC by Anders Widell
**Last Updated:** Wed Sep 14, 2016 10:43 AM UTC
**Owner:** Anders Widell


As a first step in implementing ticket [#2015], convert the 
osaf-transport-monitor shell script to a daemon.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2031 imm:README files are missing when opensaf is downloaded

2016-09-14 Thread Neelakanta Reddy
- **status**: accepted --> review



---

** [tickets:#2031] imm:README files are missing when opensaf is downloaded**

**Status:** review
**Milestone:** 5.0.1
**Created:** Wed Sep 14, 2016 02:40 AM UTC by Neelakanta Reddy
**Last Updated:** Wed Sep 14, 2016 02:40 AM UTC
**Owner:** Neelakanta Reddy


Update the Makefile.am with all README files


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2021 AMF : active compname is improperly populated in Standby callback (NPM)

2016-09-14 Thread Praveen
- **status**: unassigned --> assigned
- **assigned_to**: Praveen



---

** [tickets:#2021] AMF :  active compname is improperly populated in Standby 
callback (NPM)**

**Status:** assigned
**Milestone:** 4.7.2
**Created:** Sat Sep 10, 2016 06:52 AM UTC by Srikanth R
**Last Updated:** Sat Sep 10, 2016 06:52 AM UTC
**Owner:** Praveen


 For an application with NPM model, active compName in the standby descriptor 
is having corrupted value in the standby callback.


Breakpoint 1, pycbk_SaAmfCSISetCallbackT (invocation=4287627278, 
compName=0x941a28, haState=SA_AMF_HA_STANDBY, csiDescriptor=...) at 
saAmf_wrap.c:2914
2914saAmf_wrap.c: No such file or directory.
(gdb) p csiDescriptor 
$1 = {csiFlags = 1, csiName = {length = 48, value = 
"safCsi=CSI1,safSi=TestApp_SI4,safApp=TestApp_Npm", '\000' }, csiStateDescriptor = {activeDescriptor = {transitionDescriptor = 
1634926660, 
  activeCompName = {length = 0, value = 
"\000mp=CO\000\000\000\000\000\000\000\000u=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_Npm",
 '\000' }}, standbyDescriptor = {activeCompName = {
length = 68, value = 
"**sa\000\000\000mp=CO\000\000\000\000\000\000\000\000u=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_Npm**",
 '\000' }, standbyRank = 0}}, csiAttr = {attr = 0x7642a0, 
number = 1}}


 In the above callback ( in gdb ), the  active component name in standby 
descriptor in standby callback  should be 
safComp=COMP1,safSu=TestApp_SU3,safSg=TestApp_SG1,safApp=TestApp_Npm, but it  
is populated with improper value :
 
sa\000\000\000mp=CO\000\000\000\000\000\000\000\000u=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_Npmapo


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2009 AMF: App Si is moving to UNASSIGNED state after middleware failover

2016-09-14 Thread Praveen
- **status**: unassigned --> assigned
- **assigned_to**: Praveen
- **Part**: - --> d



---

** [tickets:#2009] AMF: App Si is moving to UNASSIGNED state after middleware 
failover**

**Status:** assigned
**Milestone:** 4.7.2
**Created:** Thu Sep 08, 2016 06:07 AM UTC by Srikanth R
**Last Updated:** Thu Sep 08, 2016 06:48 AM UTC
**Owner:** Praveen


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature enabled & 
no PBE )
AMF Application : 2N model with SUs mapped on PL-3,PL-4  ( si-si deps enabled)


Summary :
--
Application SIs are moving to UNASSIGNED state after middleware failover.


Steps followed & Observed behaviour
--
 -> Initially brought up AMF application (2n model) on two payloads.
 -> All the SIs are fully assigned state and SUs are in INSERVICE state.
 -> Performed middleware failover.
 -> After standby became active controller, SIs moved to unassigned state. But 
'amf-state siass' is showing proper output.
 -> Application received CSI remove callbacks after locking the SUs


Expected behaviour
--
-> As no fault happened on the application, SIs should not move to UNASSIGNED 
state for middleware failover.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2034 imm: IMMsv README changes fro 5.1

2016-09-14 Thread Neelakanta Reddy
- **status**: accepted --> review



---

** [tickets:#2034] imm: IMMsv README changes fro 5.1**

**Status:** review
**Milestone:** 5.1.RC2
**Created:** Wed Sep 14, 2016 08:35 AM UTC by Neelakanta Reddy
**Last Updated:** Wed Sep 14, 2016 08:35 AM UTC
**Owner:** Neelakanta Reddy


This Ticket is to update IMM README for 5.1 IMM Enhancements


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2034 imm: IMMsv README changes fro 5.1

2016-09-14 Thread Neelakanta Reddy



---

** [tickets:#2034] imm: IMMsv README changes fro 5.1**

**Status:** accepted
**Milestone:** 5.1.RC2
**Created:** Wed Sep 14, 2016 08:35 AM UTC by Neelakanta Reddy
**Last Updated:** Wed Sep 14, 2016 08:35 AM UTC
**Owner:** Neelakanta Reddy


This Ticket is to update IMM README for 5.1 IMM Enhancements


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2001 IMM: Owner handle is getting corrupt when OmAdminOperationInvoke retruns ERR_TIMEOUT

2016-09-14 Thread Chani Srivastava
New logs uploaded


Attachments:

- 
[AdminLogs.zip](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/645a8372/dfe4/attachment/AdminLogs.zip)
 (807.4 kB; application/zip)


---

** [tickets:#2001] IMM: Owner handle is getting corrupt when 
OmAdminOperationInvoke retruns ERR_TIMEOUT**

**Status:** assigned
**Milestone:** 5.1.RC2
**Created:** Tue Sep 06, 2016 07:14 AM UTC by Chani Srivastava
**Last Updated:** Wed Sep 14, 2016 02:34 AM UTC
**Owner:** Neelakanta Reddy
**Attachments:**

- 
[AdminCbkTmOut.zip](https://sourceforge.net/p/opensaf/tickets/2001/attachment/AdminCbkTmOut.zip)
 (95.1 kB; application/zip)


OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 4 nodes 1 PBE enabled

Summary:
Steps to Reproduce
1. Invoke saImmOmAdminOperationInvokeAsync_2() while waiting in callback with 
time more that OI_CALLBACK_TIMEOUT value
2. Invoke saImmOmAdminOperationInvokeAsync_2() again and do not wait OR Invoke 
any Ccb operation

Observed Bahavior:
Step1 will return SA_AIS_ERR_TIMEOUT (Expected)
Step2 is returning SA_AIS_ERR_BAD_HANDLE (SA_AIS_OK is expected)

Sep  6 12:22:27 SLOT1 python2.5: logtrace: trace enabled to file 
/tmp/imma_oi_callbacktimeout.trace, mask=0x
Sep  6 12:22:27 SLOT1 python2.5: NO IMMA library TRACE initialize done pid:1147 
svid:26 file:/tmp/imma_oi_callbacktimeout.trace
Sep  6 12:22:27 SLOT1 osafimmnd[838]: NO Implementer connected: 14 
(testOiTmout_verifyAdminOpCallback_37) <343, 2010f>
Sep  6 12:22:42 SLOT1 osafimmnd[838]: WA IMMND - Client went down so no response
Sep  6 12:22:42 SLOT1 osafimmnd[838]: WA MDS Send Failed to service:IMMND rc:2
Sep  6 12:22:42 SLOT1 osafimmnd[838]: ER Problem in sending to peer IMMND over 
MDS. Discarding admin op reply.
Sep  6 12:22:42 SLOT1 osafimmnd[838]: WA Error code 2 returned for message type 
21 - ignoring
Sep  6 12:22:47 SLOT1 osafimmnd[838]: WA IMMND - Client 1468878946575 went down 
on syncronous request, discarding request
Sep  6 12:22:47 SLOT1 osafimmnd[838]: NO Implementer locally disconnected. 
Marking it as doomed 14 <343, 2010f> (testOiTmout_verifyAdminOpCallback_37)
Sep  6 12:22:47 SLOT1 osafimmnd[838]: NO Implementer disconnected 14 <343, 
2010f> (testOiTmout_verifyAdminOpCallback_37)


Note: **Test passed in OpenSAF release 5.0**

Agent traces and immnd, immd traces attached


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets