[tickets] [opensaf:tickets] #2006 NTFSv: Cluster rebooted with ntfd crashed on both controllers

2016-09-12 Thread Vu Minh Nguyen
- **status**: review --> fixed
- **assigned_to**: Vu Minh Nguyen -->  nobody 
- **Comment**:

changeset:   8057:8081a9ddd2fc
tag: tip
parent:  8055:fee502a9845c
user:Vu Minh Nguyen 
date:Tue Sep 13 13:15:21 2016 +0700
summary: ntf: cluster rebooted with ntfd crashed on both controllers [#2006]

changeset:   8056:280d00e0eba1
branch:  opensaf-5.1.x
parent:  8054:9e774234274a
user:Vu Minh Nguyen 
date:Tue Sep 13 13:15:21 2016 +0700
summary: ntf: cluster rebooted with ntfd crashed on both controllers [#2006]




---

** [tickets:#2006] NTFSv: Cluster rebooted with ntfd crashed on both 
controllers**

**Status:** fixed
**Milestone:** 5.1.RC1
**Created:** Wed Sep 07, 2016 06:42 AM UTC by Chani Srivastava
**Last Updated:** Thu Sep 08, 2016 11:55 AM UTC
**Owner:** nobody
**Attachments:**

- 
[NtfCrash.zip](https://sourceforge.net/p/opensaf/tickets/2006/attachment/NtfCrash.zip)
 (165.9 kB; application/zip)


OS : Suse PPC 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
no PBE )

Ntfd traces and syslog for both controllers attached
*  Ntf Application is running on system
*  Will update ticket with core dump 

Note: The timings on system are not synced. After every reboot node timings are 
modified

BT:
0  0x0fffa0848100 in .raise () from /lib64/libc.so.6
1  0x0fffa0849d10 in .abort () from /lib64/libc.so.6
2  0x0fffa0e34234 in osaf_abort (i_cause=7) at osaf_utility.c:27
3  0x1001a2f8 in NtfLogger::logNotification (this=0x100ba768, notif=
std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at NtfLogger.cc:247
4  0x10019e60 in NtfLogger::checkQueueAndLog (this=0x100ba768,
newNotif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at 
NtfLogger.cc:181
5  0x10019a74 in NtfLogger::log (this=0x100ba768, 
notif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0,
isLocal=true) at NtfLogger.cc:137
6  0x1002b528 in NtfAdmin::processNotification (this=0x100ba760, 
clientId=62, notificationType=SA_NTF_TYPE_ALARM,
sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc, notificationId=47) at 
NtfAdmin.cc:203
7  0x1002b938 in NtfAdmin::notificationReceived (this=0x100ba760, 
clientId=62, notificationType=SA_NTF_TYPE_ALARM,
sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc) at NtfAdmin.cc:257
8  0x1002ec20 in notificationReceived (clientId=62, 
notificationType=SA_NTF_TYPE_ALARM, sendNotInfo=0x100b8800,
mdsCtxt=0x100bb3dc) at NtfAdmin.cc:1012
9  0x10006410 in proc_send_not_msg (cb=0x10073190 <_ntfs_cb>, 
evt=0x100bb3d0) at ntfs_evt.c:447
10 0x10006b28 in process_api_evt (evt=0x100bb3d0) at ntfs_evt.c:628
11 0x10006c38 in ntfs_process_mbx (mbx=0x10073190 <_ntfs_cb>) at 
ntfs_evt.c:660
12 0x1000b6f4 in main (argc=2, argv=0xfffc056a7f8) at ntfs_main.c:399


Active Controler:
May 26 19:41:16 linux-pvra osafntfd[24205]: **osaf_abort(7) called from 
0x1001a2f8 with errno=11**
May 26 19:41:16 linux-pvra osafamfnd[24243]: NO 
'safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
May 26 19:41:16 linux-pvra osafamfnd[24243]: ER 
safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
May 26 19:41:16 linux-pvra osafamfnd[24243]: Rebooting OpenSAF NodeId = 131343 
EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131343, SupervisionTime = 60
May 26 19:41:16 linux-pvra opensaf_reboot: Rebooting local node; timeout=60
Jun  2 14:11:28 linux-pvra syslog-ng[1639]: syslog-ng starting up; 
version='2.0.9'


Ntf Trace:
May 26 19:41:16.767426 osafntfd [24205:lga_api.c:1190] TR logBufSize > 
strlen(logBuf) + 1
May 26 19:41:16.767436 osafntfd [24205:lga_api.c:1320] << saLogWriteLogAsync
Jun  2 14:11:47.153831 osafntfd [2958:ntfs_main.c:0181] >> initialize
Jun  2 14:11:47.175099 osafntfd [2958:ncs_main_pub.c:0220] TR
NCS:PROCESS_ID=2958



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2000 msg: Cluster reset happend due to msgd crashed on both the controller

2016-09-12 Thread A V Mahesh (AVM)
- **summary**: osaf: Cluster reset happend due to msgd crashed on both the 
controller --> msg: Cluster reset happend due to msgd crashed on both the 
controller
- **status**: unassigned --> review
- **assigned_to**: A V Mahesh (AVM)



---

** [tickets:#2000] msg: Cluster reset happend due to msgd crashed on both the 
controller**

**Status:** review
**Milestone:** 5.1.RC1
**Created:** Tue Sep 06, 2016 06:04 AM UTC by Ritu Raj
**Last Updated:** Thu Sep 08, 2016 02:22 PM UTC
**Owner:** A V Mahesh (AVM)
**Attachments:**

- 
[Active_syslog](https://sourceforge.net/p/opensaf/tickets/2000/attachment/Active_syslog)
 (716.7 kB; application/octet-stream)
- 
[Standby_syslog](https://sourceforge.net/p/opensaf/tickets/2000/attachment/Standby_syslog)
 (696.4 kB; application/octet-stream)


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
1PBE enabled with 30K objects )

Summary :
--
Cluster reset happend due to assertion SA_MAX_UNEXTENDED_NAME_LENGTH failed in 
msgd

Steps followed & Observed behaviour
--
1.  Invoked failover 
2.  After, few successful failover, New Active Controller rebooted beacuse of 
Assertion 'length < SA_MAX_UNEXTENDED_NAME_LENGTH' failed in msgd. While 
previous Active joinig the cluster as a Standby Role resulted cluster reset 
happend. 
[Timeline: Sep  6 00:13:02 sofo-s2]

Sep  6 00:13:02 sofo-s2 osafimmd[3985]: NO MDS event from svc_id 24 (change:5, 
dest:13)
Sep  6 00:13:02 sofo-s2 osafmsgd[4145]: osaf_extended_name.c:139: 
osaf_extended_name_length: Assertion 'length < SA_MAX_UNEXTENDED_NAME_LENGTH' 
failed.
Sep  6 00:13:02 sofo-s2 osafamfnd[4046]: NO 
'safComp=MQD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Sep  6 00:13:02 sofo-s2 osafamfnd[4046]: ER 
safComp=MQD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Sep  6 00:13:02 sofo-s2 osafamfnd[4046]: Rebooting OpenSAF NodeId = 131599 EE 
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131599, SupervisionTime = 60
Sep  6 00:13:02 sofo-s2 opensaf_reboot: Rebooting local node; timeout=60

Notes:
1. Syslog attached
2  msgnd & msgd  trace not enabled


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1946 log: Update PR document with changes from Cluster Membership (CLM) integration enhancements

2016-09-12 Thread A V Mahesh (AVM)
- Description has changed:

Diff:



--- old
+++ new
@@ -0,0 +1,7 @@
+http://hg.code.sf.net/p/opensaf/documentation
+
+changeset:   188:65546736470c
+tag: tip
+user:A V Mahesh 
+date:Tue Sep 13 10:43:11 2016 +0530
+summary: log: Update PR document with CLM support [#1946]



- **status**: accepted --> fixed



---

** [tickets:#1946] log: Update PR document with changes from Cluster Membership 
(CLM) integration enhancements**

**Status:** fixed
**Milestone:** 5.1.RC1
**Created:** Wed Aug 10, 2016 06:24 AM UTC by A V Mahesh (AVM)
**Last Updated:** Tue Aug 30, 2016 08:57 AM UTC
**Owner:** A V Mahesh (AVM)


http://hg.code.sf.net/p/opensaf/documentation

changeset:   188:65546736470c
tag: tip
user:A V Mahesh 
date:Tue Sep 13 10:43:11 2016 +0530
summary: log: Update PR document with CLM support [#1946]


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2019 amf: Unit tests fail to build

2016-09-12 Thread Long HB Nguyen
- **status**: accepted --> review



---

** [tickets:#2019] amf: Unit tests fail to build**

**Status:** review
**Milestone:** 5.1.RC1
**Created:** Fri Sep 09, 2016 01:08 PM UTC by Anders Widell
**Last Updated:** Mon Sep 12, 2016 03:22 AM UTC
**Owner:** Long HB Nguyen


"make check" fails (32-bit system, GCC version 6.1.1, googletest version 
48ee8e98abc950abd8541e15550b18f8f6cfb3a9):

~~~
make[8]: Entering directory 
'/home/opensaf/opensaf-staging/osaf/services/saf/amf/amfd/tests'
  CXX  testamfd-test_ckpt_enc_dec.o
In file included from test_ckpt_enc_dec.cc:22:0:
/home/opensaf/googletest/googletest/include/gtest/gtest.h: In instantiation of 
'testing::AssertionResult testing::internal::CmpHelperEQ(const char*, const 
char*, const T1&, const T2&) [with T1 = unsigned int; T2 = int]':
/home/opensaf/googletest/googletest/include/gtest/gtest.h:1421:23:   required 
from 'static testing::AssertionResult 
testing::internal::EqHelper::Compare(const char*, const 
char*, const T1&, const T2&) [with T1 = unsigned int; T2 = int; bool 
lhs_is_null_literal = false]'
test_ckpt_enc_dec.cc:354:3:   required from here
/home/opensaf/googletest/googletest/include/gtest/gtest.h:1392:11: error: 
comparison between signed and unsigned integer expressions 
[-Werror=sign-compare]
   if (lhs == rhs) {
   ^~
/home/opensaf/googletest/googletest/include/gtest/gtest.h: In instantiation of 
'testing::AssertionResult testing::internal::CmpHelperEQ(const char*, const 
char*, const T1&, const T2&) [with T1 = long long unsigned int; T2 = long long 
int]':
/home/opensaf/googletest/googletest/include/gtest/gtest.h:1421:23:   required 
from 'static testing::AssertionResult 
testing::internal::EqHelper::Compare(const char*, const 
char*, const T1&, const T2&) [with T1 = long long unsigned int; T2 = long long 
int; bool lhs_is_null_literal = false]'
test_ckpt_enc_dec.cc:362:3:   required from here
/home/opensaf/googletest/googletest/include/gtest/gtest.h:1392:11: error: 
comparison between signed and unsigned integer expressions 
[-Werror=sign-compare]
cc1plus: all warnings being treated as errors
Makefile:814: recipe for target 'testamfd-test_ckpt_enc_dec.o' failed
make[8]: *** [testamfd-test_ckpt_enc_dec.o] Error 1
make[8]: Leaving directory 
'/home/opensaf/opensaf-staging/osaf/services/saf/amf/amfd/tests'
~~~


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2014 Rebooted controller not detected in TCP

2016-09-12 Thread A V Mahesh (AVM)
- Description has changed:

Diff:



--- old
+++ new
@@ -1,3 +1,10 @@
+OS environment:
+
+Debian Jessie (OpenSAF is running on bare metal, no containers or VMs)
+4.4.7 kernel
+Network eth0, bonded, OVS (I have tried all of them and the problem is 
there in all configurations)
+
+
 In 20% of the cases a "reboot -f" on  controller2 is not detected and acted 
on. What is in the mds.log is .
 
 Sep  7  6:44:23.918566 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>






---

** [tickets:#2014] Rebooted controller not detected in TCP**

**Status:** assigned
**Milestone:** 4.7.2
**Created:** Thu Sep 08, 2016 06:20 PM UTC by Jonas Arndt
**Last Updated:** Tue Sep 13, 2016 04:22 AM UTC
**Owner:** A V Mahesh (AVM)
**Attachments:**

- 
[logs.tgz](https://sourceforge.net/p/opensaf/tickets/2014/attachment/logs.tgz) 
(84.1 kB; application/x-compressed-tar)
- 
[tcp_user_timeout_2014.patch](https://sourceforge.net/p/opensaf/tickets/2014/attachment/tcp_user_timeout_2014.patch)
 (5.5 kB; application/octet-stream)


OS environment:

Debian Jessie (OpenSAF is running on bare metal, no containers or VMs)
4.4.7 kernel
Network eth0, bonded, OVS (I have tried all of them and the problem is 
there in all configurations)


In 20% of the cases a "reboot -f" on  controller2 is not detected and acted on. 
What is in the mds.log is .

Sep  7  6:44:23.918566 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:23.918595 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:34.018662 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:34.018751 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:34.018789 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:34.018818 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:44.118832 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:44.118919 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:44.118955 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:44.118984 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:54.218987 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:54.219085 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:54.219139 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:54.219168 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>

Still, there is nothing in the syslog indicating that controller2 has left the 
cluster. This is for TCP.
When the node comes back on line (without opensaf being started) controller 1 
notice finally and fail over apps. 

When the reboot is not detected the tcp keep alives stops and goes into 
retransmits instead. I have attached 2 tshark sessions captured from 
controller1, capturing traffic between controller1 and controller2. The failed 
reboot detect is captured in "ctrl2_failed_detection.trc" and for a working 
detection there is a file "ctrl2_working.trc" I have also attached all logs in 
/var/log/opensaf and the syslog (all from controller one).

It appears to me that we are hitting something similar like 
"http://stackoverflow.com/questions/33553410/tcp-retranmission-timer-overrides-kills-tcp-keepalive-timer-delaying-disconnect";

// Jonas


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2014 Rebooted controller not detected in TCP

2016-09-12 Thread A V Mahesh (AVM)
>>Tested the patch and ended up with split brain after 4th reboot. Both 
>>controllers think they are active while they can ping each other perfectly 
>>fine. I will try to reproduce and collect logs

Can you please elaborate in which sequence of test you are ending up with split 
brain :

1)  is all node in cluster detected Rebooted controller  ( Lost contact with 
'SC-2' ) with in 
1.5 seconds now ?
2)  with out this patch your are not able see `Lost contact with 'SC-2'` on any 
node with in 1.5  
 sec , what is current behavior ?
2)  is split brain  case coming after the  Rebooted controller rejoined (reboot 
-f)  ?
3)  is split brain  case coming after `reboot -f` issue on controller with out 
going for reboot ?



---

** [tickets:#2014] Rebooted controller not detected in TCP**

**Status:** assigned
**Milestone:** 4.7.2
**Created:** Thu Sep 08, 2016 06:20 PM UTC by Jonas Arndt
**Last Updated:** Mon Sep 12, 2016 07:59 PM UTC
**Owner:** A V Mahesh (AVM)
**Attachments:**

- 
[logs.tgz](https://sourceforge.net/p/opensaf/tickets/2014/attachment/logs.tgz) 
(84.1 kB; application/x-compressed-tar)
- 
[tcp_user_timeout_2014.patch](https://sourceforge.net/p/opensaf/tickets/2014/attachment/tcp_user_timeout_2014.patch)
 (5.5 kB; application/octet-stream)


In 20% of the cases a "reboot -f" on  controller2 is not detected and acted on. 
What is in the mds.log is .

Sep  7  6:44:23.918566 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:23.918595 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:34.018662 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:34.018751 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:34.018789 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:34.018818 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:44.118832 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:44.118919 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:44.118955 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:44.118984 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:54.218987 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:54.219085 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:54.219139 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:54.219168 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>

Still, there is nothing in the syslog indicating that controller2 has left the 
cluster. This is for TCP.
When the node comes back on line (without opensaf being started) controller 1 
notice finally and fail over apps. 

When the reboot is not detected the tcp keep alives stops and goes into 
retransmits instead. I have attached 2 tshark sessions captured from 
controller1, capturing traffic between controller1 and controller2. The failed 
reboot detect is captured in "ctrl2_failed_detection.trc" and for a working 
detection there is a file "ctrl2_working.trc" I have also attached all logs in 
/var/log/opensaf and the syslog (all from controller one).

It appears to me that we are hitting something similar like 
"http://stackoverflow.com/questions/33553410/tcp-retranmission-timer-overrides-kills-tcp-keepalive-timer-delaying-disconnect";

// Jonas


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2013 IMM: Search Handle getting corrupt when saImmOmSearchNext_2() returns ERR_TIMEOUT

2016-09-12 Thread Neelakanta Reddy
- **status**: unassigned --> assigned
- **assigned_to**: Neelakanta Reddy



---

** [tickets:#2013] IMM: Search Handle getting corrupt when 
saImmOmSearchNext_2() returns ERR_TIMEOUT**

**Status:** assigned
**Milestone:** 5.1.RC1
**Created:** Thu Sep 08, 2016 12:10 PM UTC by Chani Srivastava
**Last Updated:** Thu Sep 08, 2016 12:10 PM UTC
**Owner:** Neelakanta Reddy
**Attachments:**

- 
[SearchTmOut.zip](https://sourceforge.net/p/opensaf/tickets/2013/attachment/SearchTmOut.zip)
 (883.9 kB; application/zip)


OS : Suse 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 4 nodes

Summary:
Steps to Reproduce
1. Create a runtime/config object
2. Do Search Initiliaze()
3. Delete the object created in Step1
4. Do SearchNext() 
5. Do SearchNext() again 


Observed Bahavior:
Step4 will return SA_AIS_ERR_TIMEOUT (Expected)
Step5 is returning SA_AIS_ERR_BAD_HANDLE** (SA_AIS_ERR_NOT_EXIST is expected)**

**Note: Test passed in OpenSAF release 5.0**

Agent traces and immnd, immd traces attached


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1997 IMM: immnd fails to update si while bringing up opensaf with 2PBE

2016-09-12 Thread Neelakanta Reddy
- **status**: unassigned --> assigned
- **assigned_to**: Neelakanta Reddy



---

** [tickets:#1997] IMM: immnd fails to update si while bringing up opensaf with 
2PBE**

**Status:** assigned
**Milestone:** 5.1.RC1
**Created:** Fri Sep 02, 2016 11:46 AM UTC by Chani Srivastava
**Last Updated:** Fri Sep 02, 2016 11:46 AM UTC
**Owner:** Neelakanta Reddy
**Attachments:**

- 
[LogAMF.zip](https://sourceforge.net/p/opensaf/tickets/1997/attachment/LogAMF.zip)
 (432.4 kB; application/zip)


setup:
Version - OpenSAF 5.1.FC : changeset - 7997
4-Node cluster
2PBE enabled

Bring up opensaf on a controller with 2PBE enable. IMMND throwing error
Attachments: syslog, amfd and immnd traces

Sep  2 16:54:13 SLOT1 osafimmpbed: WA Start prepare for ccb: 
10004/4294967300 towards slave PBE returned: '12' from Immsv
Sep  2 16:54:13 SLOT1 osafimmpbed: WA PBE-A failed to prepare PRTA update 
Ccb:10004/4294967300 towards PBE-B
Sep  2 16:54:13 SLOT1 osafimmpbed: NO 2PBE Error (18) in PRTA update 
(ccbId:10004)
**Sep  2 16:54:13 SLOT1 osafimmnd[3632]: WA update of PERSISTENT runtime 
attributes in object 'safSi=NoRed3,safApp=OpenSAF' REVERTED. PBE rc:18
Sep  2 16:54:13 SLOT1 osafamfd[3698]: ER exec: update FAILED 18**
Sep  2 16:54:14 SLOT1 osafimmnd[3632]: NO PBE-OI established on this SC. 
Dumping incrementally to file imm.db

Note- 1. OpenSAF is successfully started
 2. Issue not seen with 1PBE

Once controller is up, amf-state si gives

safSi=SC-2N,safApp=OpenSAF
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=PARTIALLY_ASSIGNED(3)
safSi=NoRed4,safApp=OpenSAF
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=UNASSIGNED(1)
safSi=NoRed1,safApp=OpenSAF
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=FULLY_ASSIGNED(2)
safSi=NoRed2,safApp=OpenSAF
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=UNASSIGNED(1)
safSi=NoRed3,safApp=OpenSAF
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=UNASSIGNED(1)




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2001 IMM: Owner handle is getting corrupt when OmAdminOperationInvoke retruns ERR_TIMEOUT

2016-09-12 Thread Neelakanta Reddy
- **status**: unassigned --> assigned
- **assigned_to**: Neelakanta Reddy



---

** [tickets:#2001] IMM: Owner handle is getting corrupt when 
OmAdminOperationInvoke retruns ERR_TIMEOUT**

**Status:** assigned
**Milestone:** 5.1.RC1
**Created:** Tue Sep 06, 2016 07:14 AM UTC by Chani Srivastava
**Last Updated:** Tue Sep 06, 2016 10:11 AM UTC
**Owner:** Neelakanta Reddy
**Attachments:**

- 
[AdminCbkTmOut.zip](https://sourceforge.net/p/opensaf/tickets/2001/attachment/AdminCbkTmOut.zip)
 (95.1 kB; application/zip)


OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 4 nodes 1 PBE enabled

Summary:
Steps to Reproduce
1. Invoke saImmOmAdminOperationInvokeAsync_2() while waiting in callback with 
time more that OI_CALLBACK_TIMEOUT value
2. Invoke saImmOmAdminOperationInvokeAsync_2() again and do not wait OR Invoke 
any Ccb operation

Observed Bahavior:
Step1 will return SA_AIS_ERR_TIMEOUT (Expected)
Step2 is returning SA_AIS_ERR_BAD_HANDLE (SA_AIS_OK is expected)

Sep  6 12:22:27 SLOT1 python2.5: logtrace: trace enabled to file 
/tmp/imma_oi_callbacktimeout.trace, mask=0x
Sep  6 12:22:27 SLOT1 python2.5: NO IMMA library TRACE initialize done pid:1147 
svid:26 file:/tmp/imma_oi_callbacktimeout.trace
Sep  6 12:22:27 SLOT1 osafimmnd[838]: NO Implementer connected: 14 
(testOiTmout_verifyAdminOpCallback_37) <343, 2010f>
Sep  6 12:22:42 SLOT1 osafimmnd[838]: WA IMMND - Client went down so no response
Sep  6 12:22:42 SLOT1 osafimmnd[838]: WA MDS Send Failed to service:IMMND rc:2
Sep  6 12:22:42 SLOT1 osafimmnd[838]: ER Problem in sending to peer IMMND over 
MDS. Discarding admin op reply.
Sep  6 12:22:42 SLOT1 osafimmnd[838]: WA Error code 2 returned for message type 
21 - ignoring
Sep  6 12:22:47 SLOT1 osafimmnd[838]: WA IMMND - Client 1468878946575 went down 
on syncronous request, discarding request
Sep  6 12:22:47 SLOT1 osafimmnd[838]: NO Implementer locally disconnected. 
Marking it as doomed 14 <343, 2010f> (testOiTmout_verifyAdminOpCallback_37)
Sep  6 12:22:47 SLOT1 osafimmnd[838]: NO Implementer disconnected 14 <343, 
2010f> (testOiTmout_verifyAdminOpCallback_37)


Note: **Test passed in OpenSAF release 5.0**

Agent traces and immnd, immd traces attached


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2023 AMF : Long DN RT objects creation failed with ERR_TOO_LONG (13)

2016-09-12 Thread Minh Hon Chau
Hi Srikanth,

One finding in amfd & immnd trace, looks like max size allowed by imm is 2048, 
the dn has size 2614

in amfd trace:

Sep 12 18:22:40.609218 osafamfd [4481:imma_oi_api.c:2786] >> 
rt_object_create_common 
Sep 12 18:22:40.609225 osafamfd [4481:imma_oi_api.c:2892] TR attr:safCSIComp 
Sep 12 18:22:40.609230 osafamfd [4481:imma_oi_api.c:2892] TR 
attr:saAmfCSICompHAState 
Sep 12 18:22:40.609235 osafamfd [4481:imma_oi_api.c:2892] TR 
attr:saAmfCSICompHAReadinessState 
Sep 12 18:22:40.609658 osafamfd [4481:imma_oi_api.c:3063] << 
rt_object_create_common 
Sep 12 18:22:40.609683 osafamfd [4481:imm.cc:0163] ER exec: create FAILED 13


in immnd trace:

Sep 12 18:22:40.609548 osafimmnd [4430:ImmModel.cc:15622] >> rtObjectCreate: 
cont:0x7fff63d6bca8 connp:0x7fff63d6bc9c nodep:(nil)
Sep 12 18:22:40.609553 osafimmnd [4430:ImmModel.cc:15641] T2 
parentName:safCsi=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz_CSIB,safSi=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz,safApp=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrst
 uvwxyzabcdefghijklmnopqT
Sep 12 18:22:40.609559 osafimmnd [4430:ImmModel.cc:3064] >> getLongDnsAllowed 
Sep 12 18:22:40.609566 osafimmnd [4430:ImmModel.cc:3093] << getLongDnsAllowed 
Sep 12 18:22:40.609590 osafimmnd [4430:ImmModel.cc:12963] TEST2 T5 Irregular 
name. string size:1317 isgraph(,):32768, pos=444
Sep 12 18:22:40.609613 osafimmnd [4430:ImmModel.cc:15895] T7 ERR_NAME_TOO_LONG: 
DN is too long, size:2614, max size is:2048
Sep 12 18:22:40.609617 osafimmnd [4430:ImmModel.cc:16320] << rtObjectCreate 




---

** [tickets:#2023] AMF : Long DN RT objects creation failed with ERR_TOO_LONG 
(13)**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Sat Sep 10, 2016 10:57 AM UTC by Srikanth R
**Last Updated:** Mon Sep 12, 2016 12:58 PM UTC
**Owner:** nobody
**Attachments:**

- 
[2023.tgz](https://sourceforge.net/p/opensaf/tickets/2023/attachment/2023.tgz) 
(159.7 kB; application/x-compressed-tar)


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature disabled & 
no PBE  & longDn feature enabled )
AMF Application : 2N model with SUs mapped on PL-3,PL-4


Summary :
--
 Long DN RT objects creation failed with ERR_TOO_LONG during unlock operation 
of SU.


Steps followed & Observed behaviour
--

-> Initially enabled the longDn feature.

-> Later imported the attached AMF configuration successfully.

-> Now performed unlock-in and unlock operation of SU, for which following 
error is observed in syslog.

Sep 10 16:11:43 CONTROLLER-2 osafamfnd[4279]: NO Assigned 
'safSi=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz,safApp=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz'
 ACTIVE to 'safSu=SU1,safSg=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopq
 
rstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz,safApp=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmno

[tickets] [opensaf:tickets] #2014 Rebooted controller not detected in TCP

2016-09-12 Thread Jonas Arndt
Tested the patch and ended up with split brain after 4th reboot. Both 
controllers think they are active while they can ping each other perfectly 
fine. I will try to reproduce and collect logs


---

** [tickets:#2014] Rebooted controller not detected in TCP**

**Status:** assigned
**Milestone:** 4.7.2
**Created:** Thu Sep 08, 2016 06:20 PM UTC by Jonas Arndt
**Last Updated:** Mon Sep 12, 2016 11:56 AM UTC
**Owner:** A V Mahesh (AVM)
**Attachments:**

- 
[logs.tgz](https://sourceforge.net/p/opensaf/tickets/2014/attachment/logs.tgz) 
(84.1 kB; application/x-compressed-tar)
- 
[tcp_user_timeout_2014.patch](https://sourceforge.net/p/opensaf/tickets/2014/attachment/tcp_user_timeout_2014.patch)
 (5.5 kB; application/octet-stream)


In 20% of the cases a "reboot -f" on  controller2 is not detected and acted on. 
What is in the mds.log is .

Sep  7  6:44:23.918566 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:23.918595 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:34.018662 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:34.018751 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:34.018789 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:34.018818 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:44.118832 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:44.118919 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:44.118955 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:44.118984 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:54.218987 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:54.219085 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:54.219139 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:54.219168 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>

Still, there is nothing in the syslog indicating that controller2 has left the 
cluster. This is for TCP.
When the node comes back on line (without opensaf being started) controller 1 
notice finally and fail over apps. 

When the reboot is not detected the tcp keep alives stops and goes into 
retransmits instead. I have attached 2 tshark sessions captured from 
controller1, capturing traffic between controller1 and controller2. The failed 
reboot detect is captured in "ctrl2_failed_detection.trc" and for a working 
detection there is a file "ctrl2_working.trc" I have also attached all logs in 
/var/log/opensaf and the syslog (all from controller one).

It appears to me that we are hitting something similar like 
"http://stackoverflow.com/questions/33553410/tcp-retranmission-timer-overrides-kills-tcp-keepalive-timer-delaying-disconnect";

// Jonas


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. http://sdm.link/zohodev2dev___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1685 smfd: Merge rolling to singlestep procedures for several nodes

2016-09-12 Thread Rafael
- **status**: review --> fixed



---

** [tickets:#1685] smfd: Merge rolling to singlestep procedures for several 
nodes**

**Status:** fixed
**Milestone:** 5.1.RC1
**Created:** Wed Mar 02, 2016 09:20 AM UTC by Rafael
**Last Updated:** Mon Sep 12, 2016 02:43 PM UTC
**Owner:** Rafael


By extending SMF configuration we allow SMF to execute procedures in a 
optimised way. Today SMF can merge rolling procedures into one single step to 
minimize reboots. This new ticket would give SMF the additional possibility to 
have multiple single steps produced from rolling upgrades. This is in order to 
keep some services up while still reducing the amount of reboots.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1685 smfd: Merge rolling to singlestep procedures for several nodes

2016-09-12 Thread Rafael
changeset:   8053:774cfc4342ba
branch: opensaf-5.1.x
parent: 8050:602e4938d0c3
user: Rafael Odzakow 
date: Mon Sep 12 09:58:57 2016 +0200
summary: Merge rolling to singlestep procedures for several nodes [#1685]

Node ID 774cfc4342bafae3f77ba229bf05ce2091192a04
Parent  602e4938d0c3a3243e379f415695d727223b4923

changeset:   8052:31f199f17fba
user: Rafael Odzakow 
date: Mon Sep 12 09:58:57 2016 +0200
summary: Merge rolling to singlestep procedures for several nodes [#1685]

Node ID 31f199f17fba1a19128edc49746b4f4b6df74499
Parent  2c08aa065812a805e5efc13e0ac34721460821df


---

** [tickets:#1685] smfd: Merge rolling to singlestep procedures for several 
nodes**

**Status:** review
**Milestone:** 5.1.RC1
**Created:** Wed Mar 02, 2016 09:20 AM UTC by Rafael
**Last Updated:** Tue Aug 30, 2016 08:58 AM UTC
**Owner:** Rafael


By extending SMF configuration we allow SMF to execute procedures in a 
optimised way. Today SMF can merge rolling procedures into one single step to 
minimize reboots. This new ticket would give SMF the additional possibility to 
have multiple single steps produced from rolling upgrades. This is in order to 
keep some services up while still reducing the amount of reboots.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2026 base: Build failure when java is enabled

2016-09-12 Thread Anders Widell
- **status**: review --> fixed
- **Comment**:

changeset:   8054:9e774234274a
branch:  opensaf-5.1.x
user:Anders Widell 
date:Mon Sep 12 15:23:14 2016 +0200
summary: base: Fix build problem due to missing include [#2026]

changeset:   8055:fee502a9845c
parent:  8052:31f199f17fba
user:Anders Widell 
date:Mon Sep 12 15:23:14 2016 +0200
summary: base: Fix build problem due to missing include [#2026]

[staging:9e7742]
[staging:31f199]




---

** [tickets:#2026] base: Build failure when java is enabled**

**Status:** fixed
**Milestone:** 5.1.RC1
**Created:** Mon Sep 12, 2016 11:01 AM UTC by Anders Widell
**Last Updated:** Mon Sep 12, 2016 11:33 AM UTC
**Owner:** Anders Widell


Build failes when java is enabled:

~~~
Making all in ais_api_impl_native
make[3]: Entering directory '/home/opensaf-staging/java/ais_api_impl_native'
  CC   libjava_ais_api_native_la-j_ais_clm_libHandle.lo
In file included from j_ais_clm_libHandle.c:37:0:
../../osaf/libs/core/common/include/osaf_poll.h:46:65: error: unknown type name 
'int64_t'
 extern unsigned osaf_poll(struct pollfd* io_fds, nfds_t i_nfds, int64_t 
i_timeout);
 ^~~
../../osaf/libs/core/common/include/osaf_poll.h:91:39: error: unknown type name 
'int64_t'
 extern int osaf_poll_one_fd(int i_fd, int64_t  i_timeout);
   ^~~
j_ais_clm_libHandle.c: In function 
'Java_org_opensaf_ais_clm_ClmHandleImpl_invokeSaClmDispatchWhenReady':
j_ais_clm_libHandle.c:726:16: error: implicit declaration of function 
'osaf_poll' [-Werror=implicit-function-declaration]
  _pollStatus = osaf_poll(&_readFDs, 1, -1);
^
cc1: all warnings being treated as errors
Makefile:720: recipe for target 
'libjava_ais_api_native_la-j_ais_clm_libHandle.lo' failed

~~~


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1968 SMF does not handle AMF long DN&RDN support

2016-09-12 Thread elunlen
- **Version**:  --> 5.1



---

** [tickets:#1968] SMF does not handle AMF long DN&RDN support**

**Status:** review
**Milestone:** 5.1.RC1
**Created:** Wed Aug 24, 2016 12:51 PM UTC by elunlen
**Last Updated:** Mon Sep 12, 2016 10:11 AM UTC
**Owner:** elunlen


SMF already supports long DN. However there are some checks regarding AMF 
related objects that does not allow some DN to be longer than 255 (RDN 64). 
These tests shall be removed since AMF will support long DN from 5.1


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2023 AMF : Long DN RT objects creation failed with ERR_TOO_LONG (13)

2016-09-12 Thread Srikanth R
Attaching the configuration and  IMMD &IMMND traces also.


Attachments:

- 
[2023_longDn.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/b0d730dd/21a6/attachment/2023_longDn.tgz)
 (821.2 kB; application/x-compressed)


---

** [tickets:#2023] AMF : Long DN RT objects creation failed with ERR_TOO_LONG 
(13)**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Sat Sep 10, 2016 10:57 AM UTC by Srikanth R
**Last Updated:** Mon Sep 12, 2016 01:59 AM UTC
**Owner:** nobody
**Attachments:**

- 
[2023.tgz](https://sourceforge.net/p/opensaf/tickets/2023/attachment/2023.tgz) 
(159.7 kB; application/x-compressed-tar)


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature disabled & 
no PBE  & longDn feature enabled )
AMF Application : 2N model with SUs mapped on PL-3,PL-4


Summary :
--
 Long DN RT objects creation failed with ERR_TOO_LONG during unlock operation 
of SU.


Steps followed & Observed behaviour
--

-> Initially enabled the longDn feature.

-> Later imported the attached AMF configuration successfully.

-> Now performed unlock-in and unlock operation of SU, for which following 
error is observed in syslog.

Sep 10 16:11:43 CONTROLLER-2 osafamfnd[4279]: NO Assigned 
'safSi=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz,safApp=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz'
 ACTIVE to 'safSu=SU1,safSg=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopq
 
rstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz,safApp=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz'
Sep 10 16:11:43 CONTROLLER-2 osafamfd[4265]: ER exec: create FAILED 13
Sep 10 16:11:46 CONTROLLER-2 osafamfd[4265]:** ER exec: create FAILED 13**


Below is the corresponding trace in osafamfd :


Sep 10 16:11:46.647681 osafamfd [4265:imm.cc:0396] >> execute
Sep 10 16:11:46.647730 osafamfd [4265:imm.cc:0142] >> exec: Create 
safCsi=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz_CSIA,safSi=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz,safApp=AmfDemoabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxy
 zabcdefghijklmnopqrstuvT
Sep 10 16:11:46.647783 osafamfd [4265:imma_oi_api.c:2786] >> 
rt_object_create_common
Sep 10 16:11:46.647879 osafamfd [4265:imma_oi_api.c:2892] TR attr:safCSIComp
Sep 10 16:11:46.647908 osafamfd [4265:imma_oi_api.c:2892] TR 
attr:saAmfCSICompHAState
Sep 10 16:11:46.647927 osafamfd [4265:imma_oi_api.c:2892] TR 
attr:saAmfCSICompHAReadinessState
Sep 10 16:11:46.649108 osafamfd [4265:imma_oi_api.c:3063] << 
rt_object_create_common
Sep 10 16:11:46.649157 osafamfd [4265:imm.cc:0163] ER exec: create FAILED 13




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourcefo

[tickets] [opensaf:tickets] #2027 smf: Unnecessary delay in try again loop for admin operation

2016-09-12 Thread elunlen



---

** [tickets:#2027] smf: Unnecessary delay in try again loop for admin operation 
**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Mon Sep 12, 2016 12:57 PM UTC by elunlen
**Last Updated:** Mon Sep 12, 2016 12:57 PM UTC
**Owner:** nobody


In SmfAdminOperation::nodeGroupAdminOperation(...) a loop is used to handle OI 
try again return code. This loop is written is such a away that a 2 sec delay 
(sleep(2)) is always executed also if the return code from the OI is OK


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2014 Rebooted controller not detected in TCP

2016-09-12 Thread Anders Widell
The constant TCP_USER_TIMEOUT is not part of LSB, so we will anyhow need to add 
the following in our code:

~~~
#ifndef TCP_USER_TIMEOUT
#define TCP_USER_TIMEOUT 18
#endif
~~~

We should bump the minimum required Linux version to 3.18 after introducing 
this fix, since the TCP_USER_TIMEOUT feature didn't work properly in earlier 
Linux versions according to the article *Improving HA Failures with TCP 
Timeouts* you referred to.


---

** [tickets:#2014] Rebooted controller not detected in TCP**

**Status:** assigned
**Milestone:** 4.7.2
**Created:** Thu Sep 08, 2016 06:20 PM UTC by Jonas Arndt
**Last Updated:** Mon Sep 12, 2016 07:10 AM UTC
**Owner:** A V Mahesh (AVM)
**Attachments:**

- 
[logs.tgz](https://sourceforge.net/p/opensaf/tickets/2014/attachment/logs.tgz) 
(84.1 kB; application/x-compressed-tar)
- 
[tcp_user_timeout_2014.patch](https://sourceforge.net/p/opensaf/tickets/2014/attachment/tcp_user_timeout_2014.patch)
 (5.5 kB; application/octet-stream)


In 20% of the cases a "reboot -f" on  controller2 is not detected and acted on. 
What is in the mds.log is .

Sep  7  6:44:23.918566 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:23.918595 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:34.018662 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:34.018751 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:34.018789 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:34.018818 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:44.118832 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:44.118919 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:44.118955 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:44.118984 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:54.218987 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:54.219085 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:54.219139 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:54.219168 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>

Still, there is nothing in the syslog indicating that controller2 has left the 
cluster. This is for TCP.
When the node comes back on line (without opensaf being started) controller 1 
notice finally and fail over apps. 

When the reboot is not detected the tcp keep alives stops and goes into 
retransmits instead. I have attached 2 tshark sessions captured from 
controller1, capturing traffic between controller1 and controller2. The failed 
reboot detect is captured in "ctrl2_failed_detection.trc" and for a working 
detection there is a file "ctrl2_working.trc" I have also attached all logs in 
/var/log/opensaf and the syslog (all from controller one).

It appears to me that we are hitting something similar like 
"http://stackoverflow.com/questions/33553410/tcp-retranmission-timer-overrides-kills-tcp-keepalive-timer-delaying-disconnect";

// Jonas


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2026 base: Build failure when java is enabled

2016-09-12 Thread Anders Widell
- **status**: accepted --> review



---

** [tickets:#2026] base: Build failure when java is enabled**

**Status:** review
**Milestone:** 5.1.RC1
**Created:** Mon Sep 12, 2016 11:01 AM UTC by Anders Widell
**Last Updated:** Mon Sep 12, 2016 11:01 AM UTC
**Owner:** Anders Widell


Build failes when java is enabled:

~~~
Making all in ais_api_impl_native
make[3]: Entering directory '/home/opensaf-staging/java/ais_api_impl_native'
  CC   libjava_ais_api_native_la-j_ais_clm_libHandle.lo
In file included from j_ais_clm_libHandle.c:37:0:
../../osaf/libs/core/common/include/osaf_poll.h:46:65: error: unknown type name 
'int64_t'
 extern unsigned osaf_poll(struct pollfd* io_fds, nfds_t i_nfds, int64_t 
i_timeout);
 ^~~
../../osaf/libs/core/common/include/osaf_poll.h:91:39: error: unknown type name 
'int64_t'
 extern int osaf_poll_one_fd(int i_fd, int64_t  i_timeout);
   ^~~
j_ais_clm_libHandle.c: In function 
'Java_org_opensaf_ais_clm_ClmHandleImpl_invokeSaClmDispatchWhenReady':
j_ais_clm_libHandle.c:726:16: error: implicit declaration of function 
'osaf_poll' [-Werror=implicit-function-declaration]
  _pollStatus = osaf_poll(&_readFDs, 1, -1);
^
cc1: all warnings being treated as errors
Makefile:720: recipe for target 
'libjava_ais_api_native_la-j_ais_clm_libHandle.lo' failed

~~~


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2026 base: Build failure when java is enabled

2016-09-12 Thread Anders Widell



---

** [tickets:#2026] base: Build failure when java is enabled**

**Status:** accepted
**Milestone:** 5.1.RC1
**Created:** Mon Sep 12, 2016 11:01 AM UTC by Anders Widell
**Last Updated:** Mon Sep 12, 2016 11:01 AM UTC
**Owner:** Anders Widell


Build failes when java is enabled:

~~~
Making all in ais_api_impl_native
make[3]: Entering directory '/home/opensaf-staging/java/ais_api_impl_native'
  CC   libjava_ais_api_native_la-j_ais_clm_libHandle.lo
In file included from j_ais_clm_libHandle.c:37:0:
../../osaf/libs/core/common/include/osaf_poll.h:46:65: error: unknown type name 
'int64_t'
 extern unsigned osaf_poll(struct pollfd* io_fds, nfds_t i_nfds, int64_t 
i_timeout);
 ^~~
../../osaf/libs/core/common/include/osaf_poll.h:91:39: error: unknown type name 
'int64_t'
 extern int osaf_poll_one_fd(int i_fd, int64_t  i_timeout);
   ^~~
j_ais_clm_libHandle.c: In function 
'Java_org_opensaf_ais_clm_ClmHandleImpl_invokeSaClmDispatchWhenReady':
j_ais_clm_libHandle.c:726:16: error: implicit declaration of function 
'osaf_poll' [-Werror=implicit-function-declaration]
  _pollStatus = osaf_poll(&_readFDs, 1, -1);
^
cc1: all warnings being treated as errors
Makefile:720: recipe for target 
'libjava_ais_api_native_la-j_ais_clm_libHandle.lo' failed

~~~


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1906 clm: add support for long DN

2016-09-12 Thread Mathi Naickan
changeset:   187:ce165392b313
tag: tip
user:mathi.naic...@oracle.com
date:Mon Sep 12 15:47:47 2016 +0530
summary: clm: doc update for long RDN support in 5.1 [#1906]



---

** [tickets:#1906] clm: add support for long DN**

**Status:** fixed
**Milestone:** 5.1.FC
**Created:** Mon Jul 04, 2016 01:10 PM UTC by Zoran Milinkovic
**Last Updated:** Tue Aug 30, 2016 06:10 AM UTC
**Owner:** Zoran Milinkovic


Ticket [#191] introduces support for long DN.
This ticket adds the support for long RDN to CLM.

Today, there is no any reason for the support for long DN in CLM.
The main reason for adding support for long DN is to support long RDN. In 
SaClmNode class, safNode attribute may have a hostname which can be up to 63 
bytes. Together with prefix "safNode=", RDN exceedes RDN limitation of 64 bytes.

Limit of 255 bytes for DN will still remain after applying the patch. Only long 
RDN will be supported.

The same limitation applies to PLM entities in CLM.





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1968 SMF does not handle AMF long DN&RDN support

2016-09-12 Thread Mathi Naickan
- **Comment**:

Please update the "version" field if you are aware of it.



---

** [tickets:#1968] SMF does not handle AMF long DN&RDN support**

**Status:** review
**Milestone:** 5.1.RC1
**Created:** Wed Aug 24, 2016 12:51 PM UTC by elunlen
**Last Updated:** Tue Aug 30, 2016 08:56 AM UTC
**Owner:** elunlen


SMF already supports long DN. However there are some checks regarding AMF 
related objects that does not allow some DN to be longer than 255 (RDN 64). 
These tests shall be removed since AMF will support long DN from 5.1


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2011 ckptd seg faulted on active controller when trying to create checkpoint

2016-09-12 Thread A V Mahesh (AVM)
- **status**: unassigned --> accepted
- **assigned_to**: A V Mahesh (AVM)



---

** [tickets:#2011] ckptd seg faulted on active controller when trying to create 
checkpoint**

**Status:** accepted
**Milestone:** 4.7.2
**Created:** Thu Sep 08, 2016 07:28 AM UTC by Ritu Raj
**Last Updated:** Thu Sep 08, 2016 07:28 AM UTC
**Owner:** A V Mahesh (AVM)
**Attachments:**

- 
[ckptd_bt](https://sourceforge.net/p/opensaf/tickets/2011/attachment/ckptd_bt) 
(2.6 kB; application/octet-stream)
- 
[messages-20160907.bz2](https://sourceforge.net/p/opensaf/tickets/2011/attachment/messages-20160907.bz2)
 (380.1 kB; application/x-bzip)
- [syslog2](https://sourceforge.net/p/opensaf/tickets/2011/attachment/syslog2) 
(1.4 MB; application/octet-stream)


Environment details

OS : Suse 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
1PBE enabled with 30K objects )

Summary :

ckptd crashed on active controller when trying to create checkpoint during 
failover

Steps followed & Observed behaviour

1. Initially ran some CKPT test scenarios, along with failovers. After the end 
of the test scenarios, The following IMM objects &  replicas are not deleted 
sofo-s3:/dev/shm # immfind | grep 101
safCkpt=all_replicas_ckpt_name_101
safCkpt=collocated_ckpt_name_101
safReplica=safNode=PL-3\,safCluster=myClmCluster,safCkpt=all_replicas_ckpt_name_101
safReplica=safNode=PL-3\,safCluster=myClmCluster,safCkpt=collocated_ckpt_name_101
safReplica=safNode=SC-1\,safCluster=myClmCluster,safCkpt=all_replicas_ckpt_name_101
safReplica=safNode=SC-2\,safCluster=myClmCluster,safCkpt=all_replicas_ckpt_name_101

2.  When ckpt is created with the earlier name (all_replicas_ckpt_name_101)  
observed the following error in syslog. Also CkptOpen failed with ERR_LIBRARY.

>>   saImmOiRtObjectCreate_2 failed with error = 14
>>
Sep  7 17:21:11 sofo-s2 osafimmnd[2137]: NO PBE-OI established on this SC. 
Dumping incrementally to file imm.db
Sep  7 17:21:12 sofo-s2 osafckptd[2284]: ER create_runtime_ckpt_object - 
saImmOiRtObjectCreate_2 failed with error = 14
Sep  7 17:21:12 sofo-s2 osafckptd[2284]: ER create runtime ckpt object failed 
with error: 14
Sep  7 17:21:12 sofo-s2 osafckptd[2284]: ER cpd db add ckpt_node failed for 
ckpt_id:2


4. After some time cpktd seg faulted on active controller
>>
Sep  7 17:21:43 sofo-s2 osafamfnd[2187]: NO 
'safComp=CPD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Sep  7 17:21:43 sofo-s2 osafamfnd[2187]: ER 
safComp=CPD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Sep  7 17:21:43 sofo-s2 osafamfnd[2187]: Rebooting OpenSAF NodeId = 131599 EE 
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131599, SupervisionTime = 60
Sep  7 17:21:43 sofo-s2 opensaf_reboot: Rebooting local node; timeout=60

5. Below is the bt

0-  0x7fbbd5ffcb20 in memcmp () from /lib64/libc.so.6
1-  0x7fbbd7a10929 in ncs_patricia_tree_get (pTree=0x67b4c8, 
pKey=0x7d22531c "\017\001\002") at patricia.c:435

2-  0x0040800d in cpd_cpnd_info_node_get (cpnd_tree=0x67b4c8, 
dest=0x67ec60, cpnd_info_node=0x7d225350) at cpd_db.c:706

3-  0x0040cd56 in cpd_evt_proc_mds_evt (cb=0x67b340, evt=0x67ec50) at 
cpd_evt.c:1378

4-  0x004091cb in cpd_process_evt (evt=0x67ec40) at cpd_evt.c:107
5-  0x0041185f in cpd_main_process (cb=0x67b340) at cpd_init.c:661
6 - 0x00411b89 in main (argc=1, argv=0x7d225578) at cpd_main.c:74


Notes:
1. Syslog attached
2. bt attached 
3. ckptd traces not enabled


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2018 amf: Build failure on 32-bit system

2016-09-12 Thread Minh Hon Chau
- **status**: review --> fixed
- **assigned_to**: Long HB Nguyen -->  nobody 



---

** [tickets:#2018] amf: Build failure on 32-bit system**

**Status:** fixed
**Milestone:** 5.1.RC1
**Created:** Fri Sep 09, 2016 01:05 PM UTC by Anders Widell
**Last Updated:** Mon Sep 12, 2016 09:58 AM UTC
**Owner:** nobody


Build fails on 32-bit systems (GCC version 6.1.1):

~~~
make[7]: Entering directory 
'/home/opensaf/opensaf-staging/osaf/services/saf/amf/amfnd'
  CXX  osafamfnd-comp.o
In file included from ../../../../../osaf/libs/core/include/ncs_osprm.h:32:0,
 from ../../../../../osaf/libs/core/leap/include/ncsdlib.h:33,
 from ../../../../../osaf/libs/common/amf/include/amf.h:37,
 from 
../../../../../osaf/services/saf/amf/amfnd/include/avnd.h:38,
 from comp.cc:35:
comp.cc: In function 'uint32_t avnd_amfa_mds_info_evh(AVND_CB*, AVND_EVT*)':
../../../../../osaf/libs/core/common/include/logtrace.h:145:127: error: format 
'%lu' expects argument of type 'long unsigned int', but argument 6 has type 
'MDS_DEST {aka long long unsigned int}' [-Werror=format=]
 #define TRACE_ENTER2(format, args...) _logtrace_trace(__FILE__, __LINE__, 
CAT_TRACE_ENTER, "%s: " format, __FUNCTION__, ##args)

   ^
comp.cc:3016:3: note: in expansion of macro 'TRACE_ENTER2'
   TRACE_ENTER2("mds_dest :%lu, MDS version:%d",
   ^~~~
cc1plus: all warnings being treated as errors
Makefile:765: recipe for target 'osafamfnd-comp.o' failed
~~~


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2018 amf: Build failure on 32-bit system

2016-09-12 Thread Minh Hon Chau
changeset:   8051:2c08aa065812
tag: tip
parent:  8047:9dd223204b0a
user:Long Nguyen 
date:Mon Sep 12 19:52:02 2016 +1000
summary: AMF: Build failure on 32-bit system [#2018]

changeset:   8050:602e4938d0c3
branch:  opensaf-5.1.x
parent:  8048:2588d9e5fe8a
user:Long Nguyen 
date:Mon Sep 12 19:48:38 2016 +1000
summary: AMF: Build failure on 32-bit system [#2018]




---

** [tickets:#2018] amf: Build failure on 32-bit system**

**Status:** review
**Milestone:** 5.1.RC1
**Created:** Fri Sep 09, 2016 01:05 PM UTC by Anders Widell
**Last Updated:** Mon Sep 12, 2016 07:50 AM UTC
**Owner:** Long HB Nguyen


Build fails on 32-bit systems (GCC version 6.1.1):

~~~
make[7]: Entering directory 
'/home/opensaf/opensaf-staging/osaf/services/saf/amf/amfnd'
  CXX  osafamfnd-comp.o
In file included from ../../../../../osaf/libs/core/include/ncs_osprm.h:32:0,
 from ../../../../../osaf/libs/core/leap/include/ncsdlib.h:33,
 from ../../../../../osaf/libs/common/amf/include/amf.h:37,
 from 
../../../../../osaf/services/saf/amf/amfnd/include/avnd.h:38,
 from comp.cc:35:
comp.cc: In function 'uint32_t avnd_amfa_mds_info_evh(AVND_CB*, AVND_EVT*)':
../../../../../osaf/libs/core/common/include/logtrace.h:145:127: error: format 
'%lu' expects argument of type 'long unsigned int', but argument 6 has type 
'MDS_DEST {aka long long unsigned int}' [-Werror=format=]
 #define TRACE_ENTER2(format, args...) _logtrace_trace(__FILE__, __LINE__, 
CAT_TRACE_ENTER, "%s: " format, __FUNCTION__, ##args)

   ^
comp.cc:3016:3: note: in expansion of macro 'TRACE_ENTER2'
   TRACE_ENTER2("mds_dest :%lu, MDS version:%d",
   ^~~~
cc1plus: all warnings being treated as errors
Makefile:765: recipe for target 'osafamfnd-comp.o' failed
~~~


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2025 Cluster reset happend during headless as CLMNA faulted due to healthCheckcallbackTimeout

2016-09-12 Thread Ritu Raj
- **summary**: Cluster reset happend  during headless as CLMNA  faulted due to 
csiSetcallbackTimeout --> Cluster reset happend  during headless as CLMNA  
faulted due to healthCheckcallbackTimeout
- Description has changed:

Diff:



--- old
+++ new
@@ -4,13 +4,13 @@
 Setup : 5 nodes ( 3 controllers and 2 payloads with headless feature enabled & 
1PBE with 10K objects
 
 #Summary :
-Cluster reset happend  during headless as CLMNA  faulted due to 
csiSetcallbackTimeout 
+Cluster reset happend  during headless as CLMNA  faulted due to 
healthCheckcallbackTimeout
 
 #Steps followed & Observed behaviour
 1. Invoked headless by killing Active followed by Standby and Spare Controller,
 maintaining gap of 6 sec between controller reboot
 
-2. After couple of failover, CLMNA faulted on PL-4 and PL-5 due to 
csiSetcallbackTimeout, and cluster reset happened.
+2. After couple of failover, CLMNA faulted on PL-4 and PL-5 due to 
healthCheckcallbackTimeout, and cluster reset happened.
 
 Sep 10 17:52:46 SCALE_SLOT-74 osafamfnd[12421]: NO SU failover probation timer 
started (timeout: 12000 ns)
 Sep 10 17:52:46 SCALE_SLOT-74 osafamfnd[12421]: NO Performing failover of 
'safSu=PL-4,safSg=NoRed,safApp=OpenSAF' (SU failover count: 1)






---

** [tickets:#2025] Cluster reset happend  during headless as CLMNA  faulted due 
to healthCheckcallbackTimeout**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Mon Sep 12, 2016 07:32 AM UTC by Ritu Raj
**Last Updated:** Mon Sep 12, 2016 07:32 AM UTC
**Owner:** nobody
**Attachments:**

- 
[PL-4.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2025/attachment/PL-4.tar.bz2)
 (38.4 kB; application/x-bzip)
- 
[PL-5.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2025/attachment/PL-5.tar.bz2)
 (59.0 kB; application/x-bzip)
- 
[SC-1.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2025/attachment/SC-1.tar.bz2)
 (160.7 kB; application/x-bzip)
- 
[SC-2.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2025/attachment/SC-2.tar.bz2)
 (107.4 kB; application/x-bzip)
- 
[SC-3.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2025/attachment/SC-3.tar.bz2)
 (109.8 kB; application/x-bzip)


#Environment details
OS : Suse 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 5 nodes ( 3 controllers and 2 payloads with headless feature enabled & 
1PBE with 10K objects

#Summary :
Cluster reset happend  during headless as CLMNA  faulted due to 
healthCheckcallbackTimeout

#Steps followed & Observed behaviour
1. Invoked headless by killing Active followed by Standby and Spare Controller,
maintaining gap of 6 sec between controller reboot

2. After couple of failover, CLMNA faulted on PL-4 and PL-5 due to 
healthCheckcallbackTimeout, and cluster reset happened.

Sep 10 17:52:46 SCALE_SLOT-74 osafamfnd[12421]: NO SU failover probation timer 
started (timeout: 12000 ns)
Sep 10 17:52:46 SCALE_SLOT-74 osafamfnd[12421]: NO Performing failover of 
'safSu=PL-4,safSg=NoRed,safApp=OpenSAF' (SU failover count: 1)
Sep 10 17:52:46 SCALE_SLOT-74 osafamfnd[12421]: NO 
'safComp=CLMNA,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' recovery action escalated 
from 'componentFailover' to 'suFailover'
Sep 10 17:52:46 SCALE_SLOT-74 osafamfnd[12421]: NO 
'safComp=CLMNA,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' faulted due to 
'healthCheckcallbackTimeout' : Recovery is 'suFailover'
Sep 10 17:52:46 SCALE_SLOT-74 osafamfnd[12421]: ER 
safComp=CLMNA,safSu=PL-4,safSg=NoRed,safApp=OpenSAF Faulted due 
to:healthCheckcallbackTimeout Recovery is:suFailover
Sep 10 17:52:46 SCALE_SLOT-74 osafamfnd[12421]: Rebooting OpenSAF NodeId = 
132111 EE Name = , Reason: Component faulted: recovery is node failfast, 
OwnNodeId = 132111, SupervisionTime = 60


Notes:
1. There is time gap between system
With respect to PL-4(Sep 10 17:52:46 SCALE_SLOT-74) the corresponding time for 
other system as:
Sep 27 18:46:53: SC-1
Oct 03 10:02:54: SC-2
Oct 03 10:26:44: SC-3
Sep 10 17:54:46: PL-5
There is No syslog logged on controller's during above time. 

2. Syslog of SC-1,SC-2,SC-3, PL-4 and PL-5 attached
3. clmnd traces not enabled


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2018 amf: Build failure on 32-bit system

2016-09-12 Thread Long HB Nguyen
- **status**: accepted --> review



---

** [tickets:#2018] amf: Build failure on 32-bit system**

**Status:** review
**Milestone:** 5.1.RC1
**Created:** Fri Sep 09, 2016 01:05 PM UTC by Anders Widell
**Last Updated:** Mon Sep 12, 2016 05:23 AM UTC
**Owner:** Long HB Nguyen


Build fails on 32-bit systems (GCC version 6.1.1):

~~~
make[7]: Entering directory 
'/home/opensaf/opensaf-staging/osaf/services/saf/amf/amfnd'
  CXX  osafamfnd-comp.o
In file included from ../../../../../osaf/libs/core/include/ncs_osprm.h:32:0,
 from ../../../../../osaf/libs/core/leap/include/ncsdlib.h:33,
 from ../../../../../osaf/libs/common/amf/include/amf.h:37,
 from 
../../../../../osaf/services/saf/amf/amfnd/include/avnd.h:38,
 from comp.cc:35:
comp.cc: In function 'uint32_t avnd_amfa_mds_info_evh(AVND_CB*, AVND_EVT*)':
../../../../../osaf/libs/core/common/include/logtrace.h:145:127: error: format 
'%lu' expects argument of type 'long unsigned int', but argument 6 has type 
'MDS_DEST {aka long long unsigned int}' [-Werror=format=]
 #define TRACE_ENTER2(format, args...) _logtrace_trace(__FILE__, __LINE__, 
CAT_TRACE_ENTER, "%s: " format, __FUNCTION__, ##args)

   ^
comp.cc:3016:3: note: in expansion of macro 'TRACE_ENTER2'
   TRACE_ENTER2("mds_dest :%lu, MDS version:%d",
   ^~~~
cc1plus: all warnings being treated as errors
Makefile:765: recipe for target 'osafamfnd-comp.o' failed
~~~


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2025 Cluster reset happend during headless as CLMNA faulted due to csiSetcallbackTimeout

2016-09-12 Thread Ritu Raj



---

** [tickets:#2025] Cluster reset happend  during headless as CLMNA  faulted due 
to csiSetcallbackTimeout**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Mon Sep 12, 2016 07:32 AM UTC by Ritu Raj
**Last Updated:** Mon Sep 12, 2016 07:32 AM UTC
**Owner:** nobody
**Attachments:**

- 
[PL-4.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2025/attachment/PL-4.tar.bz2)
 (38.4 kB; application/x-bzip)
- 
[PL-5.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2025/attachment/PL-5.tar.bz2)
 (59.0 kB; application/x-bzip)
- 
[SC-1.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2025/attachment/SC-1.tar.bz2)
 (160.7 kB; application/x-bzip)
- 
[SC-2.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2025/attachment/SC-2.tar.bz2)
 (107.4 kB; application/x-bzip)
- 
[SC-3.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2025/attachment/SC-3.tar.bz2)
 (109.8 kB; application/x-bzip)


#Environment details
OS : Suse 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 5 nodes ( 3 controllers and 2 payloads with headless feature enabled & 
1PBE with 10K objects

#Summary :
Cluster reset happend  during headless as CLMNA  faulted due to 
csiSetcallbackTimeout 

#Steps followed & Observed behaviour
1. Invoked headless by killing Active followed by Standby and Spare Controller,
maintaining gap of 6 sec between controller reboot

2. After couple of failover, CLMNA faulted on PL-4 and PL-5 due to 
csiSetcallbackTimeout, and cluster reset happened.

Sep 10 17:52:46 SCALE_SLOT-74 osafamfnd[12421]: NO SU failover probation timer 
started (timeout: 12000 ns)
Sep 10 17:52:46 SCALE_SLOT-74 osafamfnd[12421]: NO Performing failover of 
'safSu=PL-4,safSg=NoRed,safApp=OpenSAF' (SU failover count: 1)
Sep 10 17:52:46 SCALE_SLOT-74 osafamfnd[12421]: NO 
'safComp=CLMNA,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' recovery action escalated 
from 'componentFailover' to 'suFailover'
Sep 10 17:52:46 SCALE_SLOT-74 osafamfnd[12421]: NO 
'safComp=CLMNA,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' faulted due to 
'healthCheckcallbackTimeout' : Recovery is 'suFailover'
Sep 10 17:52:46 SCALE_SLOT-74 osafamfnd[12421]: ER 
safComp=CLMNA,safSu=PL-4,safSg=NoRed,safApp=OpenSAF Faulted due 
to:healthCheckcallbackTimeout Recovery is:suFailover
Sep 10 17:52:46 SCALE_SLOT-74 osafamfnd[12421]: Rebooting OpenSAF NodeId = 
132111 EE Name = , Reason: Component faulted: recovery is node failfast, 
OwnNodeId = 132111, SupervisionTime = 60


Notes:
1. There is time gap between system
With respect to PL-4(Sep 10 17:52:46 SCALE_SLOT-74) the corresponding time for 
other system as:
Sep 27 18:46:53: SC-1
Oct 03 10:02:54: SC-2
Oct 03 10:26:44: SC-3
Sep 10 17:54:46: PL-5
There is No syslog logged on controller's during above time. 

2. Syslog of SC-1,SC-2,SC-3, PL-4 and PL-5 attached
3. clmnd traces not enabled


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2022 AMF : amfd asserted for NG lock operation ( quiesced timeout - Nway model))

2016-09-12 Thread Praveen
- **status**: unassigned --> assigned
- **assigned_to**: Praveen



---

** [tickets:#2022] AMF : amfd asserted for NG lock operation ( quiesced timeout 
- Nway model))**

**Status:** assigned
**Milestone:** 4.7.2
**Created:** Sat Sep 10, 2016 09:58 AM UTC by Srikanth R
**Last Updated:** Sat Sep 10, 2016 09:58 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[createAppTestApp.sh](https://sourceforge.net/p/opensaf/tickets/2022/attachment/createAppTestApp.sh)
 (15.8 kB; text/x-shellscript)


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature enabled & 
no PBE )
AMF Application : NPM model with SUs mapped on SC-2,PL-3,PL-4


Summary :
--
AMFD on both controllers asserted, if Nway application failed in CSI SET 
QUIESCED callback in lock operation of node group 


Steps followed & Observed behaviour
--

-> Hosted nway application on PL-3,PL-4 and SC-2 and brought up the 
application. Configuration is attached to the ticket.
-> Created a node group with all the three nodes.
-> Ensured that one of component will not respond to quiesced callback
-> Now performed the lock operation on the node group
-> amfd on both controllers asserted with the following back trace.


0  0x7f66fbc6fb55 in raise () from /lib64/libc.so.6
1  0x7f66fbc71131 in abort () from /lib64/libc.so.6
2  0x7f66fda6816a in __osafassert_fail (__file=0x51214d "su.cc", 
__line=2022, __func=0x513aa0 "dec_curr_stdby_si", __assertion=0x51355f 
"saAmfSUNumCurrStandbySIs > 0") at sysf_def.c:281

3  0x004d68cd in AVD_SU::dec_curr_stdby_si (this=0x7ccf40) at su.cc:2022
4  0x004be804 in avd_susi_update_assignment_counters (susi=0x78c670, 
action=AVSV_SUSI_ACT_DEL, current_ha_state=0, new_ha_state=0) at siass.cc:783
5  0x004be59b in avd_susi_del_send (susi=0x78c670) at siass.cc:714
6  0x004af12e in avd_sg_nway_node_fail_stable (cb=0x751b80, 
su=0x800470, susi=0x0) at sg_nway_fsm.cc:3022
7  0x004b025d in avd_sg_nway_node_fail_sg_realign (cb=0x751b80, 
su=0x800470) at sg_nway_fsm.cc:3493
8  0x004a8042 in SG_NWAY::node_fail (this=0x797c50, cb=0x751b80, 
su=0x800470) at sg_nway_fsm.cc:497
9  0x004b209e in sg_su_failover_func (su=0x800470) at sgproc.cc:525
10 0x004b2d16 in avd_su_oper_state_evh (cb=0x751b80, 
evt=0x7f66f4002940) at sgproc.cc:838
11 0x00450ba9 in process_event (cb_now=0x751b80, evt=0x7f66f4002940) at 
main.cc:768
12 0x004508cd in main_loop () at main.cc:689
13 0x00450e43 in main (argc=2, argv=0x7fff0f81ab18) at main.cc:841







---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2014 Rebooted controller not detected in TCP

2016-09-12 Thread A V Mahesh (AVM)
- Attachments has changed:

Diff:



--- old
+++ new
@@ -1 +1,2 @@
 logs.tgz (84.1 kB; application/x-compressed-tar)
+tcp_user_timeout_2014.patch (5.5 kB; application/octet-stream)



- **Comment**:

Even I have   Linux Kernel > 2.6.37  (3.0.13-0.27-default ) some how my system 
` or ` doesn't have
`#define TCP_USER_TIMEOUT`  TCP socket options.

So can some  one please test the attached `tcp_user_timeout_2014.patch` and let 
know the result/observations.

Try to tune & test  the  DTM_TCP_USER_TIMEOUT=1500 to higher and lower value in 
/etc/opensaf/dtmd.conf 



---

** [tickets:#2014] Rebooted controller not detected in TCP**

**Status:** assigned
**Milestone:** 4.7.2
**Created:** Thu Sep 08, 2016 06:20 PM UTC by Jonas Arndt
**Last Updated:** Mon Sep 12, 2016 04:34 AM UTC
**Owner:** A V Mahesh (AVM)
**Attachments:**

- 
[logs.tgz](https://sourceforge.net/p/opensaf/tickets/2014/attachment/logs.tgz) 
(84.1 kB; application/x-compressed-tar)
- 
[tcp_user_timeout_2014.patch](https://sourceforge.net/p/opensaf/tickets/2014/attachment/tcp_user_timeout_2014.patch)
 (5.5 kB; application/octet-stream)


In 20% of the cases a "reboot -f" on  controller2 is not detected and acted on. 
What is in the mds.log is .

Sep  7  6:44:23.918566 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:23.918595 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:34.018662 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:34.018751 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:34.018789 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:34.018818 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:44.118832 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:44.118919 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:44.118955 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:44.118984 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:54.218987 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:54.219085 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:54.219139 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:54.219168 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>

Still, there is nothing in the syslog indicating that controller2 has left the 
cluster. This is for TCP.
When the node comes back on line (without opensaf being started) controller 1 
notice finally and fail over apps. 

When the reboot is not detected the tcp keep alives stops and goes into 
retransmits instead. I have attached 2 tshark sessions captured from 
controller1, capturing traffic between controller1 and controller2. The failed 
reboot detect is captured in "ctrl2_failed_detection.trc" and for a working 
detection there is a file "ctrl2_working.trc" I have also attached all logs in 
/var/log/opensaf and the syslog (all from controller one).

It appears to me that we are hitting something similar like 
"http://stackoverflow.com/questions/33553410/tcp-retranmission-timer-overrides-kills-tcp-keepalive-timer-delaying-disconnect";

// Jonas


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets