date:20160908

[tickets] [opensaf:tickets] #2007 EVT: Service got hanged for 2 hours after saEvtEventPublish

2016-09-08 Thread A V Mahesh (AVM)

- **Milestone**: 5.1.RC1 --> future
- **Comment**:

Able to reproduce the problem , it doesn't look like any newly introduce issue ,

This look like multiple threads concretely callingsaEvtChannelClose()  & 
saEvtEventRetentionTimeClear()

=

(gdb) bt
#0  0x7f757034ea00 in sem_wait () from /lib64/libpthread.so.0
#1  0x7f756f496a62 in hm_block_me () from /usr/lib64/libopensaf_core.so.0
#2  0x7f756f496bdd in ncshm_destroy_hdl () from 
/usr/lib64/libopensaf_core.so.0
#3  0x7f7570b7ba17 in eda_channel_hdl_rec_del () from 
/usr/lib64/libSaEvt.so.1
#4  0x7f7570b76d24 in saEvtChannelClose () at eda_saf_api.c:895
#5  0x00427c57 in tet_saEvtChannelClose (ptrChannelHandle=0x659710) at 
src/tet_edsv_wrappers.c:198
#6  0x0040ce15 in tet_RetentionTimeClear_Thread () at src/tet_eda.c:4790
#7  0x0040eb3e in tet_invoketp (icnum=300, tpnum=1) at 
src/tet_eda.c:6279
#8  0x00429aff in call_1tp (icnum=300, tpnum=1, testnum=300) at 
tcm_main.c:581
#9  0x0042a0b5 in call_tps (tpcount=, icnum=) at tcm_main.c:477
#10 tet_tcm_main (argc=, argv=) at tcm_main.c:432
#11 0x0042c0fd in main (argc=1082677280, argv=0x80) at main.c:83
(gdb) generate-core-file
Saved corefile core.6197
(gdb) bt full
#0  0x7f757034ea00 in sem_wait () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x7f756f496a62 in hm_block_me () from /usr/lib64/libopensaf_core.so.0
mbcsv_init_process_req_func = {0x7f756f49b720 
, 0x7f756f49d000 
,
  0x7f756f49bcd0 , 0x7f756f49bbe0 
, 0x7f756f49bde0 ,
  0x7f756f49c140 , 0x7f756f49c2d0 
, 0x7f756f49c5b0 
,
  0x7f756f49c8c0 , 0x7f756f49ca60 
, 0x7f756f49ba20 ,
  0x7f756f49ccc0 }
#2  0x7f756f496bdd in ncshm_destroy_hdl () from 
/usr/lib64/libopensaf_core.so.0
mbcsv_init_process_req_func = {0x7f756f49b720 
, 0x7f756f49d000 
,
  0x7f756f49bcd0 , 0x7f756f49bbe0 
, 0x7f756f49bde0 ,
  0x7f756f49c140 , 0x7f756f49c2d0 
, 0x7f756f49c5b0 
,
  0x7f756f49c8c0 , 0x7f756f49ca60 
, 0x7f756f49ba20 ,
  0x7f756f49ccc0 }
#3  0x7f7570b7ba17 in eda_channel_hdl_rec_del () from 
/usr/lib64/libSaEvt.so.1
s_agent_startup_mutex = {__data = {__lock = 0, __count = 0, __owner = 
0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 
0x0}},
  __size = '\000' , __align = 0}
eda_use_count = 1
gl_eda_hdl = 4290773003
#4  0x7f7570b76d24 in saEvtChannelClose () at eda_saf_api.c:895
gl_eda_hdl = 4290773003
#5  0x00427c57 in tet_saEvtChannelClose (ptrChannelHandle=0x659710) at 
src/tet_edsv_wrappers.c:198
try_again_count = 0
#6  0x0040ce15 in tet_RetentionTimeClear_Thread () at src/tet_eda.c:4790
No locals.
#7  0x0040eb3e in tet_invoketp (icnum=300, tpnum=1) at 
src/tet_eda.c:6279
No locals.
#8  0x00429aff in call_1tp (icnum=300, tpnum=1, testnum=300) at 
tcm_main.c:581
No locals.
#9  0x0042a0b5 in call_tps (tpcount=, icnum=) at tcm_main.c:477
testnum = -512
tpnum = 1
#10 tet_tcm_main (argc=, argv=) at tcm_main.c:432
cp = 
icp = 0x65a5d0
iccount = 
tpcount = 1
icnum = 300
rc = 
nsys = 0
#11 0x0042c0fd in main (argc=1082677280, argv=0x80) at main.c:83
No locals.
(gdb) bt thread apply all
A syntax error in expression, near `apply all'.
(gdb)  thread apply all bt
 
Thread 4 (Thread 0x7f7570f9eb00 (LWP 6198)):
#0  0x7f756fe224f6 in poll () from /lib64/libc.so.6
#1  0x7f756f485fd1 in osaf_ppoll () from /usr/lib64/libopensaf_core.so.0
#2  0x7f756f48d9ef in ncs_tmr_wait () from /usr/lib64/libopensaf_core.so.0
#3  0x7f75703487b6 in start_thread () from /lib64/libpthread.so.0
#4  0x7f756fe2b9cd in clone () from /lib64/libc.so.6
#5  0x in ?? ()
 
Thread 3 (Thread 0x7f7570f6bb00 (LWP 6199)):
#0  0x7f756fe224f6 in poll () from /lib64/libc.so.6
#1  0x7f756f4c317e in mdtm_process_recv_events () from 
/usr/lib64/libopensaf_core.so.0
#2  0x7f75703487b6 in start_thread () from /lib64/libpthread.so.0
#3  0x7f756fe2b9cd in clone () from /lib64/libc.so.6
#4  0x in ?? ()
 
Thread 2 (Thread 0x7f756ef45700 (LWP 6200)):
#0  0x7f756fdf9c0d in nanosleep () from /lib64/libc.so.6
#1  0x7f756fdf9a2c in sleep () from /lib64/libc.so.6
#2  0x00428e76 in eda_selection_thread () at src/tet_edsv_wrappers.c:643
#3  0x7f75703487b6 in start_thread () from /lib64/libpthread.so.0
#4  0x7f756fe2b9cd in clone () from /lib64/libc.so.6
#5  0x in ?? ()
 
Thread 1 (Thread 0x7f7570f6e720 (LWP 6197)):
#0  0x7f757034ea00 in sem_wait () from /lib64/libpthread.so.0
#1  0x7f756f496a62 in hm_block_me () from /usr/lib64/libopensaf_core.so.0
#2  0x7f756f496bdd in ncshm_destroy_hdl () from 
/usr/lib64/libopensaf_core.so.0
#3  0x7f7570b7ba17 in eda_channel_hdl_rec_del () from

[tickets] [opensaf:tickets] #2014 Rebooted controller not detected in TCP

2016-09-08 Thread A V Mahesh (AVM)

>> It appears to me that we are hitting something similar like 
>> >>"http://stackoverflow.com/questions/33553410/tcp-retranmission-timer-overrides-kills-tcp-keepalive->>timer-delaying-disconnect"

Have you economized  below configuration in /etc/opensaf/dtmd.conf ?

The above case disconnection is via keepalive timer (idle time=40 sec, 4 
probes, probe time=10 sec).

==

 /# so_keepalive: Enable sending of keep-alive messages on connection-oriented
/# sockets. Expects an integer boolean flag
/# Note that without this set none of the tcp options will matter
DTM_SKEEPALIVE=1
 
/#
/# tcp_keepalive_time: The time (in seconds) the connection needs to remain
/# idle before TCP starts sending keepalive probes
/# Optional
DTM_TCP_KEEPIDLE_TIME=2 

==




---

** [tickets:#2014] Rebooted controller not detected in TCP**

**Status:** assigned
**Milestone:** 4.7.2
**Created:** Thu Sep 08, 2016 06:20 PM UTC by Jonas Arndt
**Last Updated:** Fri Sep 09, 2016 04:16 AM UTC
**Owner:** A V Mahesh (AVM)
**Attachments:**

- 
[logs.tgz](https://sourceforge.net/p/opensaf/tickets/2014/attachment/logs.tgz) 
(84.1 kB; application/x-compressed-tar)


In 20% of the cases a "reboot -f" on  controller2 is not detected and acted on. 
What is in the mds.log is .

Sep  7  6:44:23.918566 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:23.918595 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:34.018662 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:34.018751 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:34.018789 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:34.018818 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:44.118832 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:44.118919 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:44.118955 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:44.118984 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:54.218987 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:54.219085 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:54.219139 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:54.219168 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>

Still, there is nothing in the syslog indicating that controller2 has left the 
cluster. This is for TCP.
When the node comes back on line (without opensaf being started) controller 1 
notice finally and fail over apps. 

When the reboot is not detected the tcp keep alives stops and goes into 
retransmits instead. I have attached 2 tshark sessions captured from 
controller1, capturing traffic between controller1 and controller2. The failed 
reboot detect is captured in "ctrl2_failed_detection.trc" and for a working 
detection there is a file "ctrl2_working.trc" I have also attached all logs in 
/var/log/opensaf and the syslog (all from controller one).

It appears to me that we are hitting something similar like 
"http://stackoverflow.com/questions/33553410/tcp-retranmission-timer-overrides-kills-tcp-keepalive-timer-delaying-disconnect;

// Jonas


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2014 Rebooted controller not detected in TCP

2016-09-08 Thread A V Mahesh (AVM)

- **status**: unassigned --> assigned
- **assigned_to**: A V Mahesh (AVM)
- **Component**: unknown --> dtm
- **Part**: lib --> -
- **Priority**: critical --> major
- **Comment**:

Can you please provide your Cluster environment ( OS / VM /container ) details 



---

** [tickets:#2014] Rebooted controller not detected in TCP**

**Status:** assigned
**Milestone:** 4.7.2
**Created:** Thu Sep 08, 2016 06:20 PM UTC by Jonas Arndt
**Last Updated:** Thu Sep 08, 2016 06:20 PM UTC
**Owner:** A V Mahesh (AVM)
**Attachments:**

- 
[logs.tgz](https://sourceforge.net/p/opensaf/tickets/2014/attachment/logs.tgz) 
(84.1 kB; application/x-compressed-tar)


In 20% of the cases a "reboot -f" on  controller2 is not detected and acted on. 
What is in the mds.log is .

Sep  7  6:44:23.918566 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:23.918595 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:34.018662 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:34.018751 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:34.018789 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:34.018818 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:44.118832 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:44.118919 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:44.118955 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:44.118984 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:54.218987 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:54.219085 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:54.219139 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:54.219168 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>

Still, there is nothing in the syslog indicating that controller2 has left the 
cluster. This is for TCP.
When the node comes back on line (without opensaf being started) controller 1 
notice finally and fail over apps. 

When the reboot is not detected the tcp keep alives stops and goes into 
retransmits instead. I have attached 2 tshark sessions captured from 
controller1, capturing traffic between controller1 and controller2. The failed 
reboot detect is captured in "ctrl2_failed_detection.trc" and for a working 
detection there is a file "ctrl2_working.trc" I have also attached all logs in 
/var/log/opensaf and the syslog (all from controller one).

It appears to me that we are hitting something similar like 
"http://stackoverflow.com/questions/33553410/tcp-retranmission-timer-overrides-kills-tcp-keepalive-timer-delaying-disconnect;

// Jonas


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2008 AMFND: Coredump while shutting down

2016-09-08 Thread Minh Hon Chau

changeset:   8031:1f2d5df7d8b7
branch:  opensaf-5.1.x
parent:  8028:4bd26e7de69c
user:minh-chau 
date:Fri Sep 09 08:02:39 2016 +1000
summary: AMFND: Fix amfnd coredump if sc failover while shutting down 
[#2008]

changeset:   8030:1412efc8c888
tag: qparent
user:minh-chau 
date:Fri Sep 09 07:55:38 2016 +1000
summary: AMFND: Fix amfnd coredump if sc failover while shutting down 
[#2008]




---

** [tickets:#2008] AMFND: Coredump while shutting down**

**Status:** fixed
**Milestone:** 5.1.RC1
**Created:** Wed Sep 07, 2016 12:35 PM UTC by Minh Hon Chau
**Last Updated:** Thu Sep 08, 2016 10:09 PM UTC
**Owner:** nobody
**Attachments:**

- 
[osafamfnd](https://sourceforge.net/p/opensaf/tickets/2008/attachment/osafamfnd)
 (135.3 kB; application/octet-stream)


During cluster shutting down phase, if both controllers do not shutdown fast 
enough and active controller goes down first, then a possibility of sc failover 
happens. In this situation, avnd_last_step_clean() gets called twice, a 
coredump is generated

It most likely because deleting record in nodeid_mdsdest_db and hctypedb but 
those container still own the key. Thus, the second call of 
avnd_last_step_clean() cause coredump

BT

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/local/lib/opensaf/osafamfnd --tracemask=0x'.
Program terminated with signal SIGABRT, Aborted.
0  0x7f56a225bcc9 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
56  ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
Traceback (most recent call last):
  File 
"/usr/share/gdb/auto-load/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19-gdb.py", 
line 63, in 
from libstdcxx.v6.printers import register_libstdcxx_printers
ImportError: No module named 'libstdcxx'
(gdb) bt
0  0x7f56a225bcc9 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56

1  0x7f56a225f0d8 in __GI_abort () at abort.c:89

2  0x7f56a2298394 in __libc_message (do_abort=do_abort@entry=1, 
fmt=fmt@entry=0x7f56a23a6b28 "*** Error in `%s': %s: 0x%s ***\n")
at ../sysdeps/posix/libc_fatal.c:175

3  0x7f56a22a466e in malloc_printerr (ptr=, 
str=0x7f56a23a2c19 "free(): invalid pointer", action=1) at malloc.c:4996

4  _int_free (av=, p=, have_lock=0) at 
malloc.c:3840

5  0x0043a616 in _M_dispose (__a=..., this=)
at /usr/include/c++/4.8/bits/basic_string.h:249

6  ~basic_string (this=0x1d5fa70, __in_chrg=)
at /usr/include/c++/4.8/bits/basic_string.h:539

7  ~avnd_hctype_tag (this=0x1d5fa70, __in_chrg=)
at ../../../../../osaf/services/saf/amf/amfnd/include/avnd_hc.h:46

8  avnd_last_step_clean (cb=cb@entry=0x665940 <_avnd_cb>) at term.cc:101

9  0x00436ee1 in avnd_su_si_oper_done (cb=cb@entry=0x665940 <_avnd_cb>, 
su=0x1d5d000, 
si=si@entry=0x0) at susm.cc:1169

10 0x00416629 in avnd_comp_csi_assign_done (cb=cb@entry=0x665940 
<_avnd_cb>, 
comp=comp@entry=0x1d63260, csi=csi@entry=0x0) at comp.cc:1642

11 0x00416a6e in avnd_comp_cmplete_all_assignment (cb=cb@entry=0x665940 
<_avnd_cb>, 
comp=comp@entry=0x1d63260) at comp.cc:2567

12 0x0040bb9b in avnd_comp_clc_terming_cleansucc_hdler 
(cb=cb@entry=0x665940 <_avnd_cb>, 
comp=comp@entry=0x1d63260) at clc.cc:2328

13 0x0040f6ba in avnd_comp_clc_fsm_run (cb=cb@entry=0x665940 
<_avnd_cb>, 
comp=comp@entry=0x1d63260, ev=AVND_COMP_CLC_PRES_FSM_EV_CLEANUP_SUCC) at 
clc.cc:876

14 0x0040ffca in avnd_evt_clc_resp_evh (cb=0x665940 <_avnd_cb>, 
evt=0x7f568c0008c0)
at clc.cc:414

15 0x00425f5f in avnd_evt_process (evt=0x7f568c0008c0) at main.cc:625

16 avnd_main_process () at main.cc:576



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2008 AMFND: Coredump while shutting down

2016-09-08 Thread Minh Hon Chau

- **status**: review --> fixed
- **assigned_to**: Minh Hon Chau -->  nobody 



---

** [tickets:#2008] AMFND: Coredump while shutting down**

**Status:** fixed
**Milestone:** 5.1.RC1
**Created:** Wed Sep 07, 2016 12:35 PM UTC by Minh Hon Chau
**Last Updated:** Thu Sep 08, 2016 12:03 PM UTC
**Owner:** nobody
**Attachments:**

- 
[osafamfnd](https://sourceforge.net/p/opensaf/tickets/2008/attachment/osafamfnd)
 (135.3 kB; application/octet-stream)


During cluster shutting down phase, if both controllers do not shutdown fast 
enough and active controller goes down first, then a possibility of sc failover 
happens. In this situation, avnd_last_step_clean() gets called twice, a 
coredump is generated

It most likely because deleting record in nodeid_mdsdest_db and hctypedb but 
those container still own the key. Thus, the second call of 
avnd_last_step_clean() cause coredump

BT

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/local/lib/opensaf/osafamfnd --tracemask=0x'.
Program terminated with signal SIGABRT, Aborted.
0  0x7f56a225bcc9 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
56  ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
Traceback (most recent call last):
  File 
"/usr/share/gdb/auto-load/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19-gdb.py", 
line 63, in 
from libstdcxx.v6.printers import register_libstdcxx_printers
ImportError: No module named 'libstdcxx'
(gdb) bt
0  0x7f56a225bcc9 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56

1  0x7f56a225f0d8 in __GI_abort () at abort.c:89

2  0x7f56a2298394 in __libc_message (do_abort=do_abort@entry=1, 
fmt=fmt@entry=0x7f56a23a6b28 "*** Error in `%s': %s: 0x%s ***\n")
at ../sysdeps/posix/libc_fatal.c:175

3  0x7f56a22a466e in malloc_printerr (ptr=, 
str=0x7f56a23a2c19 "free(): invalid pointer", action=1) at malloc.c:4996

4  _int_free (av=, p=, have_lock=0) at 
malloc.c:3840

5  0x0043a616 in _M_dispose (__a=..., this=)
at /usr/include/c++/4.8/bits/basic_string.h:249

6  ~basic_string (this=0x1d5fa70, __in_chrg=)
at /usr/include/c++/4.8/bits/basic_string.h:539

7  ~avnd_hctype_tag (this=0x1d5fa70, __in_chrg=)
at ../../../../../osaf/services/saf/amf/amfnd/include/avnd_hc.h:46

8  avnd_last_step_clean (cb=cb@entry=0x665940 <_avnd_cb>) at term.cc:101

9  0x00436ee1 in avnd_su_si_oper_done (cb=cb@entry=0x665940 <_avnd_cb>, 
su=0x1d5d000, 
si=si@entry=0x0) at susm.cc:1169

10 0x00416629 in avnd_comp_csi_assign_done (cb=cb@entry=0x665940 
<_avnd_cb>, 
comp=comp@entry=0x1d63260, csi=csi@entry=0x0) at comp.cc:1642

11 0x00416a6e in avnd_comp_cmplete_all_assignment (cb=cb@entry=0x665940 
<_avnd_cb>, 
comp=comp@entry=0x1d63260) at comp.cc:2567

12 0x0040bb9b in avnd_comp_clc_terming_cleansucc_hdler 
(cb=cb@entry=0x665940 <_avnd_cb>, 
comp=comp@entry=0x1d63260) at clc.cc:2328

13 0x0040f6ba in avnd_comp_clc_fsm_run (cb=cb@entry=0x665940 
<_avnd_cb>, 
comp=comp@entry=0x1d63260, ev=AVND_COMP_CLC_PRES_FSM_EV_CLEANUP_SUCC) at 
clc.cc:876

14 0x0040ffca in avnd_evt_clc_resp_evh (cb=0x665940 <_avnd_cb>, 
evt=0x7f568c0008c0)
at clc.cc:414

15 0x00425f5f in avnd_evt_process (evt=0x7f568c0008c0) at main.cc:625

16 avnd_main_process () at main.cc:576



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2014 Rebooted controller not detected in TCP

2016-09-08 Thread Jonas Arndt




---

** [tickets:#2014] Rebooted controller not detected in TCP**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Thu Sep 08, 2016 06:20 PM UTC by Jonas Arndt
**Last Updated:** Thu Sep 08, 2016 06:20 PM UTC
**Owner:** nobody
**Attachments:**

- 
[logs.tgz](https://sourceforge.net/p/opensaf/tickets/2014/attachment/logs.tgz) 
(84.1 kB; application/x-compressed-tar)


In 20% of the cases a "reboot -f" on  controller2 is not detected and acted on. 
What is in the mds.log is .

Sep  7  6:44:23.918566 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:23.918595 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:34.018662 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:34.018751 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:34.018789 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:34.018818 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:44.118832 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:44.118919 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:44.118955 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:44.118984 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:54.218987 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:54.219085 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:54.219139 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:54.219168 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>

Still, there is nothing in the syslog indicating that controller2 has left the 
cluster. This is for TCP.
When the node comes back on line (without opensaf being started) controller 1 
notice finally and fail over apps. 

When the reboot is not detected the tcp keep alives stops and goes into 
retransmits instead. I have attached 2 tshark sessions captured from 
controller1, capturing traffic between controller1 and controller2. The failed 
reboot detect is captured in "ctrl2_failed_detection.trc" and for a working 
detection there is a file "ctrl2_working.trc" I have also attached all logs in 
/var/log/opensaf and the syslog (all from controller one).

It appears to me that we are hitting something similar like 
"http://stackoverflow.com/questions/33553410/tcp-retranmission-timer-overrides-kills-tcp-keepalive-timer-delaying-disconnect;

// Jonas


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2000 osaf: Cluster reset happend due to msgd crashed on both the controller

2016-09-08 Thread Anders Widell

- **Component**: osaf --> msg



---

** [tickets:#2000] osaf: Cluster reset happend due to msgd crashed on both the 
controller**

**Status:** unassigned
**Milestone:** 5.1.RC1
**Created:** Tue Sep 06, 2016 06:04 AM UTC by Ritu Raj
**Last Updated:** Wed Sep 07, 2016 09:38 AM UTC
**Owner:** nobody
**Attachments:**

- 
[Active_syslog](https://sourceforge.net/p/opensaf/tickets/2000/attachment/Active_syslog)
 (716.7 kB; application/octet-stream)
- 
[Standby_syslog](https://sourceforge.net/p/opensaf/tickets/2000/attachment/Standby_syslog)
 (696.4 kB; application/octet-stream)


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
1PBE enabled with 30K objects )

Summary :
--
Cluster reset happend due to assertion SA_MAX_UNEXTENDED_NAME_LENGTH failed in 
msgd

Steps followed & Observed behaviour
--
1.  Invoked failover 
2.  After, few successful failover, New Active Controller rebooted beacuse of 
Assertion 'length < SA_MAX_UNEXTENDED_NAME_LENGTH' failed in msgd. While 
previous Active joinig the cluster as a Standby Role resulted cluster reset 
happend. 
[Timeline: Sep  6 00:13:02 sofo-s2]

Sep  6 00:13:02 sofo-s2 osafimmd[3985]: NO MDS event from svc_id 24 (change:5, 
dest:13)
Sep  6 00:13:02 sofo-s2 osafmsgd[4145]: osaf_extended_name.c:139: 
osaf_extended_name_length: Assertion 'length < SA_MAX_UNEXTENDED_NAME_LENGTH' 
failed.
Sep  6 00:13:02 sofo-s2 osafamfnd[4046]: NO 
'safComp=MQD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Sep  6 00:13:02 sofo-s2 osafamfnd[4046]: ER 
safComp=MQD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Sep  6 00:13:02 sofo-s2 osafamfnd[4046]: Rebooting OpenSAF NodeId = 131599 EE 
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131599, SupervisionTime = 60
Sep  6 00:13:02 sofo-s2 opensaf_reboot: Rebooting local node; timeout=60

Notes:
1. Syslog attached
2  msgnd & msgd  trace not enabled


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1954 log: assertion failed in log_stream_close

2016-09-08 Thread Vu Minh Nguyen

- **status**: review --> fixed
- **assigned_to**: Vu Minh Nguyen -->  nobody 
- **Milestone**: 4.7.2 --> 5.0.1
- **Comment**:

changeset:   8029:3417fcd840a3
tag: tip
parent:  8026:eed08ce4437e
user:Vu Minh Nguyen 
date:Thu Sep 08 19:23:52 2016 +0700
summary: log: assertion failed in log_stream_close [#1954]

changeset:   8028:4bd26e7de69c
branch:  opensaf-5.1.x
parent:  8025:5c1dfa0c9bf1
user:Vu Minh Nguyen 
date:Thu Sep 08 19:20:48 2016 +0700
summary: log: assertion failed in log_stream_close [#1954]

changeset:   8027:bc9afc86a424
branch:  opensaf-5.0.x
parent:  8024:4e2638e8f818
user:Vu Minh Nguyen 
date:Thu Sep 08 19:18:22 2016 +0700
summary: log: assertion failed in log_stream_close [#1954]




---

** [tickets:#1954] log: assertion failed in log_stream_close**

**Status:** fixed
**Milestone:** 5.0.1
**Created:** Tue Aug 16, 2016 09:54 AM UTC by Vu Minh Nguyen
**Last Updated:** Thu Aug 18, 2016 01:52 AM UTC
**Owner:** nobody


In `lgs_client_delete()`, `log_stream_close()` is called without NULL check.
If it is the case, the node will be rebooted due to assertion failed.

> Aug 16 13:26:04 SC-1 osaflogd[6016]: lgs_stream.cc:759: log_stream_close: 
> Assertion 'stream != NULL' failed.

This ticket is going to add the protection.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1969 smf: One step upgrade with cluster reboot does not wait for nodes to start

2016-09-08 Thread elunlen

When SMF is started after a reboot and shall continue with a campaign it is 
checked that all nodes that are part of the campaign is available. In this case 
the campaign has requested a cluster reboot after the procedure execute state 
is completed. After restart the campaign shall continue with the procedure 
wrap-up state. The preparation for this includes asking for node Id of all 
nodes that’s part of the campaign and when all nodes has answered the wrap-up 
will be done.
The problem here is that in this case each node is checked for node up with a 
timeout of 10s (this is hard coded) and if a node is not up within this time 
the campaign will fail.
•   Each node has a timeout of 10s
•   Nodes are checked in sequence meaning that the last node checked may 
have longer time to start if there has been any waiting done for any of the 
previous ones
•   The check starts when smfd has started on the active SC node and some 
of the other nodes may already have been started by then and some not
Al together this means that this behavior is unpredictable and since the worst 
case will give a rather short timeout it may also be considered as unstable.

For 2) I suggest the following to be done:
1.  Create a temporary (quick) fix by just using a longer (hard coded)  
timeout if reboot upgrade to be released with 5.1 (defect ticket).
Will this create any NBC problem?
2.  Define and implement a better handling of this e.g. by making it 
possible to configure the timeout via a new attribute in the smf configuration 
object. Can be released as an enhancement in 5.2
Any better suggestions?



---

** [tickets:#1969] smf: One step upgrade with cluster reboot does not wait for 
nodes to start**

**Status:** unassigned
**Milestone:** 5.0.1
**Created:** Wed Aug 24, 2016 01:01 PM UTC by elunlen
**Last Updated:** Thu Sep 01, 2016 09:50 AM UTC
**Owner:** nobody


When using the one step upgrade feature with a cluster reboot all nodes will 
restart including the SC-nodes. This is done as the last action in the upgrade 
step. After the active SC-node is up again SMF will continue with the procedure 
wrapup. When collecting information in order to prepare the wrapup the node 
destination for all nodes in the campaign is requested. However this 
information can only be collected from nodes that are started and has joined 
the cluster (unlocked).
The problem is that SMF does not seems wait in order to give all nodes a chance 
to join the cluster and if SMF fails to get node destination from any of the 
nodes the campaign will fail as seen in the log below. When reading node 
destination there is a 10 sec “try again” loop waiting for “node up” for each 
node. It is not unlikely that the active SC-node comes up before some of the 
other nodes and that it will take more than 10 sec after that before some of 
the other nodes joins the cluster. If that's the case the campaign will fail


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2013 IMM: Search Handle getting corrupt when saImmOmSearchNext_2() returns ERR_TIMEOUT

2016-09-08 Thread Chani Srivastava




---

** [tickets:#2013] IMM: Search Handle getting corrupt when 
saImmOmSearchNext_2() returns ERR_TIMEOUT**

**Status:** unassigned
**Milestone:** 5.1.RC1
**Created:** Thu Sep 08, 2016 12:10 PM UTC by Chani Srivastava
**Last Updated:** Thu Sep 08, 2016 12:10 PM UTC
**Owner:** nobody
**Attachments:**

- 
[SearchTmOut.zip](https://sourceforge.net/p/opensaf/tickets/2013/attachment/SearchTmOut.zip)
 (883.9 kB; application/zip)


OS : Suse 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 4 nodes

Summary:
Steps to Reproduce
1. Create a runtime/config object
2. Do Search Initiliaze()
3. Delete the object created in Step1
4. Do SearchNext() 
5. Do SearchNext() again 


Observed Bahavior:
Step4 will return SA_AIS_ERR_TIMEOUT (Expected)
Step5 is returning SA_AIS_ERR_BAD_HANDLE** (SA_AIS_ERR_NOT_EXIST is expected)**

**Note: Test passed in OpenSAF release 5.0**

Agent traces and immnd, immd traces attached


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2008 AMFND: Coredump while shutting down

2016-09-08 Thread Minh Hon Chau

- **status**: assigned --> review



---

** [tickets:#2008] AMFND: Coredump while shutting down**

**Status:** review
**Milestone:** 5.1.RC1
**Created:** Wed Sep 07, 2016 12:35 PM UTC by Minh Hon Chau
**Last Updated:** Thu Sep 08, 2016 11:30 AM UTC
**Owner:** Minh Hon Chau
**Attachments:**

- 
[osafamfnd](https://sourceforge.net/p/opensaf/tickets/2008/attachment/osafamfnd)
 (135.3 kB; application/octet-stream)


During cluster shutting down phase, if both controllers do not shutdown fast 
enough and active controller goes down first, then a possibility of sc failover 
happens. In this situation, avnd_last_step_clean() gets called twice, a 
coredump is generated

It most likely because deleting record in nodeid_mdsdest_db and hctypedb but 
those container still own the key. Thus, the second call of 
avnd_last_step_clean() cause coredump

BT

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/local/lib/opensaf/osafamfnd --tracemask=0x'.
Program terminated with signal SIGABRT, Aborted.
0  0x7f56a225bcc9 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
56  ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
Traceback (most recent call last):
  File 
"/usr/share/gdb/auto-load/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19-gdb.py", 
line 63, in 
from libstdcxx.v6.printers import register_libstdcxx_printers
ImportError: No module named 'libstdcxx'
(gdb) bt
0  0x7f56a225bcc9 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56

1  0x7f56a225f0d8 in __GI_abort () at abort.c:89

2  0x7f56a2298394 in __libc_message (do_abort=do_abort@entry=1, 
fmt=fmt@entry=0x7f56a23a6b28 "*** Error in `%s': %s: 0x%s ***\n")
at ../sysdeps/posix/libc_fatal.c:175

3  0x7f56a22a466e in malloc_printerr (ptr=, 
str=0x7f56a23a2c19 "free(): invalid pointer", action=1) at malloc.c:4996

4  _int_free (av=, p=, have_lock=0) at 
malloc.c:3840

5  0x0043a616 in _M_dispose (__a=..., this=)
at /usr/include/c++/4.8/bits/basic_string.h:249

6  ~basic_string (this=0x1d5fa70, __in_chrg=)
at /usr/include/c++/4.8/bits/basic_string.h:539

7  ~avnd_hctype_tag (this=0x1d5fa70, __in_chrg=)
at ../../../../../osaf/services/saf/amf/amfnd/include/avnd_hc.h:46

8  avnd_last_step_clean (cb=cb@entry=0x665940 <_avnd_cb>) at term.cc:101

9  0x00436ee1 in avnd_su_si_oper_done (cb=cb@entry=0x665940 <_avnd_cb>, 
su=0x1d5d000, 
si=si@entry=0x0) at susm.cc:1169

10 0x00416629 in avnd_comp_csi_assign_done (cb=cb@entry=0x665940 
<_avnd_cb>, 
comp=comp@entry=0x1d63260, csi=csi@entry=0x0) at comp.cc:1642

11 0x00416a6e in avnd_comp_cmplete_all_assignment (cb=cb@entry=0x665940 
<_avnd_cb>, 
comp=comp@entry=0x1d63260) at comp.cc:2567

12 0x0040bb9b in avnd_comp_clc_terming_cleansucc_hdler 
(cb=cb@entry=0x665940 <_avnd_cb>, 
comp=comp@entry=0x1d63260) at clc.cc:2328

13 0x0040f6ba in avnd_comp_clc_fsm_run (cb=cb@entry=0x665940 
<_avnd_cb>, 
comp=comp@entry=0x1d63260, ev=AVND_COMP_CLC_PRES_FSM_EV_CLEANUP_SUCC) at 
clc.cc:876

14 0x0040ffca in avnd_evt_clc_resp_evh (cb=0x665940 <_avnd_cb>, 
evt=0x7f568c0008c0)
at clc.cc:414

15 0x00425f5f in avnd_evt_process (evt=0x7f568c0008c0) at main.cc:625

16 avnd_main_process () at main.cc:576



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1985 log: cppcheck version 1.75 find errors in logsv

2016-09-08 Thread Vu Minh Nguyen

- **status**: review --> fixed
- **assigned_to**: Canh Truong -->  nobody 
- **Milestone**: 4.7.2 --> 5.0.1
- **Comment**:

changeset:   8026:eed08ce4437e
tag: tip
parent:  8020:e5f162184bbd
user:Canh Van Truong 
date:Thu Sep 08 18:59:51 2016 +0700
summary: log: fix errors reported by cppcheck version 1.75 [#1985]

changeset:   8025:5c1dfa0c9bf1
branch:  opensaf-5.1.x
parent:  8021:68b29ac33324
user:Canh Van Truong 
date:Thu Sep 08 18:59:51 2016 +0700
summary: log: fix errors reported by cppcheck version 1.75 [#1985]

changeset:   8024:4e2638e8f818
branch:  opensaf-5.0.x
parent:  8022:2139f3e6b37b
user:Canh Van Truong 
date:Thu Sep 08 18:59:51 2016 +0700
summary: log: fix errors reported by cppcheck version 1.75 [#1985]




---

** [tickets:#1985] log: cppcheck version 1.75 find errors in logsv**

**Status:** fixed
**Milestone:** 5.0.1
**Created:** Tue Aug 30, 2016 08:33 AM UTC by Canh Truong
**Last Updated:** Thu Sep 01, 2016 02:34 AM UTC
**Owner:** nobody


osaf/services/saf/logsv/lgs/lgs_clm.cc:120]: (error) Uninitialized variable: rc
osaf/services/saf/logsv/lgs/lgs_evt.cc:892]: (error) Invalid strncmp() argument 
nr 3. A non-boolean value is required.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2006 NTFSv: Cluster rebooted with ntfd crashed on both controllers

2016-09-08 Thread Vu Minh Nguyen

- **status**: accepted --> review



---

** [tickets:#2006] NTFSv: Cluster rebooted with ntfd crashed on both 
controllers**

**Status:** review
**Milestone:** 5.1.RC1
**Created:** Wed Sep 07, 2016 06:42 AM UTC by Chani Srivastava
**Last Updated:** Thu Sep 08, 2016 06:57 AM UTC
**Owner:** Vu Minh Nguyen
**Attachments:**

- 
[NtfCrash.zip](https://sourceforge.net/p/opensaf/tickets/2006/attachment/NtfCrash.zip)
 (165.9 kB; application/zip)


OS : Suse PPC 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
no PBE )

Ntfd traces and syslog for both controllers attached
*  Ntf Application is running on system
*  Will update ticket with core dump 

Note: The timings on system are not synced. After every reboot node timings are 
modified

BT:
0  0x0fffa0848100 in .raise () from /lib64/libc.so.6
1  0x0fffa0849d10 in .abort () from /lib64/libc.so.6
2  0x0fffa0e34234 in osaf_abort (i_cause=7) at osaf_utility.c:27
3  0x1001a2f8 in NtfLogger::logNotification (this=0x100ba768, notif=
std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at NtfLogger.cc:247
4  0x10019e60 in NtfLogger::checkQueueAndLog (this=0x100ba768,
newNotif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at 
NtfLogger.cc:181
5  0x10019a74 in NtfLogger::log (this=0x100ba768, 
notif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0,
isLocal=true) at NtfLogger.cc:137
6  0x1002b528 in NtfAdmin::processNotification (this=0x100ba760, 
clientId=62, notificationType=SA_NTF_TYPE_ALARM,
sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc, notificationId=47) at 
NtfAdmin.cc:203
7  0x1002b938 in NtfAdmin::notificationReceived (this=0x100ba760, 
clientId=62, notificationType=SA_NTF_TYPE_ALARM,
sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc) at NtfAdmin.cc:257
8  0x1002ec20 in notificationReceived (clientId=62, 
notificationType=SA_NTF_TYPE_ALARM, sendNotInfo=0x100b8800,
mdsCtxt=0x100bb3dc) at NtfAdmin.cc:1012
9  0x10006410 in proc_send_not_msg (cb=0x10073190 <_ntfs_cb>, 
evt=0x100bb3d0) at ntfs_evt.c:447
10 0x10006b28 in process_api_evt (evt=0x100bb3d0) at ntfs_evt.c:628
11 0x10006c38 in ntfs_process_mbx (mbx=0x10073190 <_ntfs_cb>) at 
ntfs_evt.c:660
12 0x1000b6f4 in main (argc=2, argv=0xfffc056a7f8) at ntfs_main.c:399


Active Controler:
May 26 19:41:16 linux-pvra osafntfd[24205]: **osaf_abort(7) called from 
0x1001a2f8 with errno=11**
May 26 19:41:16 linux-pvra osafamfnd[24243]: NO 
'safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
May 26 19:41:16 linux-pvra osafamfnd[24243]: ER 
safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
May 26 19:41:16 linux-pvra osafamfnd[24243]: Rebooting OpenSAF NodeId = 131343 
EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131343, SupervisionTime = 60
May 26 19:41:16 linux-pvra opensaf_reboot: Rebooting local node; timeout=60
Jun  2 14:11:28 linux-pvra syslog-ng[1639]: syslog-ng starting up; 
version='2.0.9'


Ntf Trace:
May 26 19:41:16.767426 osafntfd [24205:lga_api.c:1190] TR logBufSize > 
strlen(logBuf) + 1
May 26 19:41:16.767436 osafntfd [24205:lga_api.c:1320] << saLogWriteLogAsync
Jun  2 14:11:47.153831 osafntfd [2958:ntfs_main.c:0181] >> initialize
Jun  2 14:11:47.175099 osafntfd [2958:ncs_main_pub.c:0220] TR
NCS:PROCESS_ID=2958



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2002 CLM : Agent crashed for invalid check in buffer notification parameter

2016-09-08 Thread Mathi Naickan

- **status**: unassigned --> assigned
- **assigned_to**: Mathi Naickan



---

** [tickets:#2002] CLM : Agent crashed for invalid check in buffer notification 
parameter**

**Status:** assigned
**Milestone:** 5.1.RC1
**Created:** Tue Sep 06, 2016 08:15 AM UTC by Srikanth R
**Last Updated:** Tue Sep 06, 2016 08:15 AM UTC
**Owner:** Mathi Naickan


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature disabled & 
no PBE )
AMF Application : 2N model with SUs mapped on PL-3,PL-4



Steps followed & Observed behaviour
--

-> Call saClmClusterTrack_4 api with CURRENT flag and buffer parameter 
populated.  Here the buffer paramter is populated by allocating suffiicent 
memory of numberOfItems but notification is having garbage values.

Agent crashed with the following back trace, if notification is having garbage 
values.

 -> #3  0x7f4ccb370c9f in osaf_extended_name_length (name=0x9d5e4e) at 
osaf_extended_name.c:139
-> #4  0x7f4cca9ff27c in clma_validate_flags_buf_4 (hdl_rec=0x97cbc0, 
flags=1 '\001', buf=0x97c190) at clma_api.c:183
->#5  0x7f4ccaa00fe5 in clmaclustertrack (clmHandle=4290772993, flags=1 
'\001', buf=0x0, buf_4=0x97c190) at clma_api.c:1032
->#6  0x7f4ccaa00d40 in saClmClusterTrack_4 (clmHandle=4290772993, flags=1 
'\001', buf=0x97c190) at clma_api.c:958


Expected behaviour
--
If the buffer parameter is NULL, CLM shall invoke a callback. If the buffer 
parameter is not NULL, CLM should check only value of numberOfItems  and 
evaluate whether sufficient memory is allocated by user or not.  

With the #1906 changes, contents of notification are also verified.  But only 
structure member numberOfItems  is to be verified.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2012 clm: inconsistant additionalText and lengthAdditionalText in notification construction

2016-09-08 Thread Vu Minh Nguyen

- **status**: accepted --> review



---

** [tickets:#2012] clm: inconsistant additionalText and lengthAdditionalText in 
notification construction**

**Status:** review
**Milestone:** 5.0.1
**Created:** Thu Sep 08, 2016 11:20 AM UTC by Vu Minh Nguyen
**Last Updated:** Thu Sep 08, 2016 11:20 AM UTC
**Owner:** Vu Minh Nguyen


According to NTF AIS, `additionalText` must be consistent with 
`lengthAdditionalText`.

In current code, CLM always set an hard-code `ADDITION_TEXT_LENGTH` to 
`lengthAdditionalText` regardless of what `additionalText` is.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2008 AMFND: Coredump while shutting down

2016-09-08 Thread Minh Hon Chau

I think the deletion of nodeid_mdsdest_db and hctypedb and hctypedb in 
avnd_last_step_clean()  was introduced due to valgrind's complains while 
"opensafd stop"
If take out those changes, there are memleak complains:

==538== 16 bytes in 1 blocks are definitely lost in loss record 16 of 142
==538==at 0x4C2B0E0: operator new(unsigned long) (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==538==by 0x42D149: avnd_nodeid_mdsdest_rec_add(avnd_cb_tag*, unsigned 
long) (proxydb.cc:55)
==538==by 0x42B793: avnd_evt_mds_avnd_up_evh(avnd_cb_tag*, avnd_evt_tag*) 
(proxy.cc:52)
==538==by 0x425F5E: avnd_evt_process (main.cc:625)
==538==by 0x425F5E: avnd_main_process() (main.cc:576)
==538==by 0x4058B2: main (main.cc:201)

==538== 1,592 (312 direct, 1,280 indirect) bytes in 13 blocks are definitely 
lost in loss record 135 of 142
==538==at 0x4C2B0E0: operator new(unsigned long) (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==538==by 0x423828: hctype_create (hcdb.cc:160)
==538==by 0x423828: avnd_hctype_config_get(unsigned long long, std::string 
const&) (hcdb.cc:218)
==538==by 0x423B45: avnd_hc_config_get(avnd_comp_tag*) (hcdb.cc:119)
==538==by 0x41989A: avnd_comp_config_get_su(avnd_su_tag*) (compdb.cc:1559)
==538==by 0x4304FE: avnd_evt_avd_reg_su_evh(avnd_cb_tag*, avnd_evt_tag*) 
(su.cc:161)
==538==by 0x425F5E: avnd_evt_process (main.cc:625)
==538==by 0x425F5E: avnd_main_process() (main.cc:576)
==538==by 0x4058B2: main (main.cc:201)


---

** [tickets:#2008] AMFND: Coredump while shutting down**

**Status:** assigned
**Milestone:** 5.1.RC1
**Created:** Wed Sep 07, 2016 12:35 PM UTC by Minh Hon Chau
**Last Updated:** Thu Sep 08, 2016 04:32 AM UTC
**Owner:** Minh Hon Chau
**Attachments:**

- 
[osafamfnd](https://sourceforge.net/p/opensaf/tickets/2008/attachment/osafamfnd)
 (135.3 kB; application/octet-stream)


During cluster shutting down phase, if both controllers do not shutdown fast 
enough and active controller goes down first, then a possibility of sc failover 
happens. In this situation, avnd_last_step_clean() gets called twice, a 
coredump is generated

It most likely because deleting record in nodeid_mdsdest_db and hctypedb but 
those container still own the key. Thus, the second call of 
avnd_last_step_clean() cause coredump

BT

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/local/lib/opensaf/osafamfnd --tracemask=0x'.
Program terminated with signal SIGABRT, Aborted.
0  0x7f56a225bcc9 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
56  ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
Traceback (most recent call last):
  File 
"/usr/share/gdb/auto-load/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19-gdb.py", 
line 63, in 
from libstdcxx.v6.printers import register_libstdcxx_printers
ImportError: No module named 'libstdcxx'
(gdb) bt
0  0x7f56a225bcc9 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56

1  0x7f56a225f0d8 in __GI_abort () at abort.c:89

2  0x7f56a2298394 in __libc_message (do_abort=do_abort@entry=1, 
fmt=fmt@entry=0x7f56a23a6b28 "*** Error in `%s': %s: 0x%s ***\n")
at ../sysdeps/posix/libc_fatal.c:175

3  0x7f56a22a466e in malloc_printerr (ptr=, 
str=0x7f56a23a2c19 "free(): invalid pointer", action=1) at malloc.c:4996

4  _int_free (av=, p=, have_lock=0) at 
malloc.c:3840

5  0x0043a616 in _M_dispose (__a=..., this=)
at /usr/include/c++/4.8/bits/basic_string.h:249

6  ~basic_string (this=0x1d5fa70, __in_chrg=)
at /usr/include/c++/4.8/bits/basic_string.h:539

7  ~avnd_hctype_tag (this=0x1d5fa70, __in_chrg=)
at ../../../../../osaf/services/saf/amf/amfnd/include/avnd_hc.h:46

8  avnd_last_step_clean (cb=cb@entry=0x665940 <_avnd_cb>) at term.cc:101

9  0x00436ee1 in avnd_su_si_oper_done (cb=cb@entry=0x665940 <_avnd_cb>, 
su=0x1d5d000, 
si=si@entry=0x0) at susm.cc:1169

10 0x00416629 in avnd_comp_csi_assign_done (cb=cb@entry=0x665940 
<_avnd_cb>, 
comp=comp@entry=0x1d63260, csi=csi@entry=0x0) at comp.cc:1642

11 0x00416a6e in avnd_comp_cmplete_all_assignment (cb=cb@entry=0x665940 
<_avnd_cb>, 
comp=comp@entry=0x1d63260) at comp.cc:2567

12 0x0040bb9b in avnd_comp_clc_terming_cleansucc_hdler 
(cb=cb@entry=0x665940 <_avnd_cb>, 
comp=comp@entry=0x1d63260) at clc.cc:2328

13 0x0040f6ba in avnd_comp_clc_fsm_run (cb=cb@entry=0x665940 
<_avnd_cb>, 
comp=comp@entry=0x1d63260, ev=AVND_COMP_CLC_PRES_FSM_EV_CLEANUP_SUCC) at 
clc.cc:876

14 0x0040ffca in avnd_evt_clc_resp_evh (cb=0x665940 <_avnd_cb>, 
evt=0x7f568c0008c0)
at clc.cc:414

15 0x00425f5f in avnd_evt_process (evt=0x7f568c0008c0) at main.cc:625

16 avnd_main_process () at main.cc:576



---

Sent from sourceforge.net because

[tickets] [opensaf:tickets] #2012 clm: inconsistant additionalText and lengthAdditionalText in notification construction

2016-09-08 Thread Vu Minh Nguyen

- **summary**: clm: inconsistant additionalText and lengthAdditionalText in 
construct notification  --> clm: inconsistant additionalText and 
lengthAdditionalText in notification construction



---

** [tickets:#2012] clm: inconsistant additionalText and lengthAdditionalText in 
notification construction**

**Status:** accepted
**Milestone:** 5.0.1
**Created:** Thu Sep 08, 2016 11:20 AM UTC by Vu Minh Nguyen
**Last Updated:** Thu Sep 08, 2016 11:20 AM UTC
**Owner:** Vu Minh Nguyen


According to NTF AIS, `additionalText` must be consistent with 
`lengthAdditionalText`.

In current code, CLM always set an hard-code `ADDITION_TEXT_LENGTH` to 
`lengthAdditionalText` regardless of what `additionalText` is.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2012 clm: inconsistant additionalText and lengthAdditionalText in construct notification

2016-09-08 Thread Vu Minh Nguyen




---

** [tickets:#2012] clm: inconsistant additionalText and lengthAdditionalText in 
construct notification **

**Status:** accepted
**Milestone:** 5.0.1
**Created:** Thu Sep 08, 2016 11:20 AM UTC by Vu Minh Nguyen
**Last Updated:** Thu Sep 08, 2016 11:20 AM UTC
**Owner:** Vu Minh Nguyen


According to NTF AIS, `additionalText` must be consistent with 
`lengthAdditionalText`.

In current code, CLM always set an hard-code `ADDITION_TEXT_LENGTH` to 
`lengthAdditionalText` regardless of what `additionalText` is.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1995 AMF : amfd crashed while dumping AMF state

2016-09-08 Thread Praveen

- **status**: unassigned --> accepted
- **assigned_to**: Praveen
- **Part**: - --> d



---

** [tickets:#1995] AMF : amfd crashed while dumping AMF state**

**Status:** accepted
**Milestone:** 5.1.RC1
**Created:** Fri Sep 02, 2016 08:42 AM UTC by Srikanth R
**Last Updated:** Fri Sep 02, 2016 08:42 AM UTC
**Owner:** Praveen


Changeset : 7997 5.1 FC

AMFD crashed while dumping the amf state, with the following command.

 immadm -a @safAmfService2020f -o 99 @safAmfService2020f
 
 
 Sep  2 12:51:26 CONTROLLER-2 osafamfd[2691]: NO unknown type: 
@safAmfService2020f
Sep  2 12:51:26 CONTROLLER-2 osafamfd[2691]: imm.cc:648: 
object_name_to_class_type: Assertion 'false' failed.
Sep  2 12:51:26 CONTROLLER-2 osafamfnd[2701]: WA AMF director unexpectedly 
crashed
Sep  2 12:51:26 CONTROLLER-2 osafamfnd[2701]: Rebooting OpenSAF NodeId = 131599 
EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, 
OwnNodeId = 131599, SupervisionTime = 60






---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1970 imm: immoitest testsuite 4 fails when CCB takes more than 2 seconds to commit

2016-09-08 Thread Hung Nguyen

- **status**: review --> fixed
- **Comment**:

default(5.2) [staging:e5f162]
changeset:   8020:e5f162184bbd
user:Hung Nguyen 
date:Fri Aug 26 13:29:26 2016 +0700
summary: imm: Remove the poll timeout in IMM testcases [#1970]

opensaf-5.1.x  [staging:68b29a]
changeset:   8021:68b29ac33324
user:Hung Nguyen 
date:Fri Aug 26 13:29:26 2016 +0700
summary: imm: Remove the poll timeout in IMM testcases [#1970]

opensaf-5.0.x  [staging:2139f3]
changeset:   8022:2139f3e6b37b
user:Hung Nguyen 
date:Fri Aug 26 13:29:26 2016 +0700
summary: imm: Remove the poll timeout in IMM testcases [#1970]

opensaf-4.7.x  [staging:eef359]
changeset:   8023:eef3593c3597
user:Hung Nguyen 
date:Fri Aug 26 13:29:26 2016 +0700
summary: imm: Remove the poll timeout in IMM testcases [#1970]




---

** [tickets:#1970] imm: immoitest testsuite 4 fails when CCB takes more than 2 
seconds to commit**

**Status:** fixed
**Milestone:** 4.7.2
**Created:** Thu Aug 25, 2016 06:09 AM UTC by Hung Nguyen
**Last Updated:** Sun Aug 28, 2016 06:25 AM UTC
**Owner:** Hung Nguyen


In classImplementerThreadMain(), poll() was invoked with timeout of 2 seconds.
The test uses that timeout to stop the thread (i.e stopping the thread when 
there's no callback in 2 seconds).
But that also causes:
* The testcase fails if pbe takes more than 2 seconds to dump. The while() loop 
stops after 2 seconds but then it fails to release the implementer name as the 
ccb is still active.
* The testcase is slow because it has to wait for 2 seconds to stop the thread.

~~~
2016-08-02 21:45:43 SC-2 osafimmnd[437]: NO Create of class TestClassConfig is 
PERSISTENT.
2016-08-02 21:45:43 SC-2 osafimmnd[437]: NO Create of class TestClassRuntime is 
PERSISTENT.
2016-08-02 21:45:43 SC-2 osafimmnd[437]: NO Ccb 4925 COMMITTED (startup)
2016-08-02 21:45:43 SC-2 osafimmnd[437]: NO Ccb 4926 COMMITTED (om_setup)
2016-08-02 21:45:43 SC-2 osafimmnd[437]: NO Ccb 4927 COMMITTED (om_setup)
2016-08-02 21:45:43 SC-2 osafimmnd[437]: NO Implementer connected: 312 
(classImplementerThreadMain) <1170, 2020f>
2016-08-02 21:45:43 SC-2 osafimmnd[437]: NO implementer for class 
'TestClassConfig' is classImplementerThreadMain => 
class extent is safe.
2016-08-02 21:45:46 SC-2 osafimmnd[437]: NO ERR_BUSY: ccb 4928 is active on 
object Obj1,rdn=root of class 
TestClassConfig. Can not release class implementer
2016-08-02 21:45:46 SC-2 osafimmnd[437]: NO Implementer locally disconnected. 
Marking it as doomed 312 <1170, 2020f> 
(classImplementerThreadMain)
2016-08-02 21:45:46 SC-2 osafimmnd[437]: WA CCB 4928 is in critical state, can 
not abort
2016-08-02 21:45:46 SC-2 osafimmnd[437]: WA Will not terminate ccb 4928 in 
critical state 
~~~




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2003 amf: SG unstable when SU moves to TERM_FAILED state during fresh assignments.

2016-09-08 Thread Praveen

- **status**: review --> fixed
- **Comment**:

changeset:   8016:77a9f5df113f
branch:  opensaf-4.7.x
user:praveen.malv...@oracle.com
date:Thu Sep 08 14:13:52 2016 +0530
summary: amfnd: send recovery request to amfd for term-failed su [#2003]

changeset:   8017:f15bc3868b81
branch:  opensaf-5.0.x
parent:  8014:0b491ef33bb8
user:praveen.malv...@oracle.com
date:Thu Sep 08 14:14:25 2016 +0530
summary: amfnd: send recovery request to amfd for term-failed su [#2003]

changeset:   8018:466142dde156
branch:  opensaf-5.1.x
parent:  8013:9acf7c9aecab
user:praveen.malv...@oracle.com
date:Thu Sep 08 14:14:49 2016 +0530
summary: amfnd: send recovery request to amfd for term-failed su [#2003]

changeset:   8019:21bf64e1130a
tag: tip
parent:  8012:46edfce1d524
user:praveen.malv...@oracle.com
date:Thu Sep 08 14:14:59 2016 +0530
summary: amfnd: send recovery request to amfd for term-failed su [#2003]





---

** [tickets:#2003] amf: SG unstable when SU moves to TERM_FAILED state during 
fresh assignments.**

**Status:** fixed
**Milestone:** 4.7.2
**Created:** Tue Sep 06, 2016 08:31 AM UTC by Praveen
**Last Updated:** Tue Sep 06, 2016 09:30 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[term_failed.tgz](https://sourceforge.net/p/opensaf/tickets/2003/attachment/term_failed.tgz)
 (30.1 kB; application/x-compressed)


Conf: 2N model, one NPI comp in NPI SU.
Steps to reproduce:
1)Add application using immcfg command.
2)Lock SG.
3)Unlock-in and unlock SUs.
4)Make provisions so that instantiation and clean up scripts returns with 
non-zero status.
5)Unlock SG.

When SG is unlocked, AMFND initiates active assignments by instantiating the 
only component. After instantiation failure, AMFND tries to clean up the 
component. Cleanup fails. AMFND marks comp and SU in TERM_FAILED state, but it 
neither responds to AMFD for the completion of assignment nor it sends any 
recovery request. Because of this SG remains unstable in REALIGN state.In this 
state, no admin operation is allowed.
Attached are traces.

Even though issue seems to be similar to #538, it is different in one aspect. 
In #538, SU moves to TERM_FAILED state and there is possibiltiy of 
failover/switchover as standby assignments are present.
In the present case, it happened during initial assignments and thus there is 
no standby to switchover/failover to. 



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1973 imm: IMM test returns zero even when it fails

2016-09-08 Thread Hung Nguyen

- **status**: review --> fixed
- **Comment**:

default(5.2) [staging:46edfc]
changeset:   8012:46edfce1d524
user:Hung Nguyen 
date:Fri Aug 26 17:43:01 2016 +0700
summary: imm: Remove pthread_exit from IMM test [#1973]

opensaf-5.1.x [staging:ba9a42]
changeset:   8013:9acf7c9aecab
user:Hung Nguyen 
date:Fri Aug 26 17:43:01 2016 +0700
summary: imm: Remove pthread_exit from IMM test [#1973]

opensaf-5.0.x [staging:0b491e]
changeset:   8014:0b491ef33bb8
user:Hung Nguyen 
date:Fri Aug 26 17:43:01 2016 +0700
summary: imm: Remove pthread_exit from IMM test [#1973]

opensaf-4.7.x [staging:]
changeset:   8015:a2728b93c7c0
user:Hung Nguyen 
date:Fri Aug 26 17:43:01 2016 +0700
summary: imm: Remove pthread_exit from IMM test [#1973]





---

** [tickets:#1973] imm: IMM test returns zero even when it fails**

**Status:** fixed
**Milestone:** 4.7.2
**Created:** Fri Aug 26, 2016 08:15 AM UTC by Hung Nguyen
**Last Updated:** Sun Aug 28, 2016 06:25 AM UTC
**Owner:** Hung Nguyen


Snippet from main() in immtest.c

~~~
int main(int argc, char **argv) 
{
...

/* Added pthread_exit() to remove dlopen@@GLIBC leak from valgrind */
pthread_exit(NULL);

return rc;
}
~~~

pthread_exit() should be removed because it makes the test exit before 'return 
rc'.
I tried to run valgrind without pthread_exit(), it didn't complain anything 
about dlopen.





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2011 ckptd seg faulted on active controller when trying to create checkpoint

2016-09-08 Thread Ritu Raj




---

** [tickets:#2011] ckptd seg faulted on active controller when trying to create 
checkpoint**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Thu Sep 08, 2016 07:28 AM UTC by Ritu Raj
**Last Updated:** Thu Sep 08, 2016 07:28 AM UTC
**Owner:** nobody
**Attachments:**

- 
[ckptd_bt](https://sourceforge.net/p/opensaf/tickets/2011/attachment/ckptd_bt) 
(2.6 kB; application/octet-stream)
- 
[messages-20160907.bz2](https://sourceforge.net/p/opensaf/tickets/2011/attachment/messages-20160907.bz2)
 (380.1 kB; application/x-bzip)
- [syslog2](https://sourceforge.net/p/opensaf/tickets/2011/attachment/syslog2) 
(1.4 MB; application/octet-stream)


Environment details

OS : Suse 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
1PBE enabled with 30K objects )

Summary :

ckptd crashed on active controller when trying to create checkpoint during 
failover

Steps followed & Observed behaviour

1. Initially ran some CKPT test scenarios, along with failovers. After the end 
of the test scenarios, The following IMM objects &  replicas are not deleted 
sofo-s3:/dev/shm # immfind | grep 101
safCkpt=all_replicas_ckpt_name_101
safCkpt=collocated_ckpt_name_101
safReplica=safNode=PL-3\,safCluster=myClmCluster,safCkpt=all_replicas_ckpt_name_101
safReplica=safNode=PL-3\,safCluster=myClmCluster,safCkpt=collocated_ckpt_name_101
safReplica=safNode=SC-1\,safCluster=myClmCluster,safCkpt=all_replicas_ckpt_name_101
safReplica=safNode=SC-2\,safCluster=myClmCluster,safCkpt=all_replicas_ckpt_name_101

2.  When ckpt is created with the earlier name (all_replicas_ckpt_name_101)  
observed the following error in syslog. Also CkptOpen failed with ERR_LIBRARY.

>>   saImmOiRtObjectCreate_2 failed with error = 14
>>
Sep  7 17:21:11 sofo-s2 osafimmnd[2137]: NO PBE-OI established on this SC. 
Dumping incrementally to file imm.db
Sep  7 17:21:12 sofo-s2 osafckptd[2284]: ER create_runtime_ckpt_object - 
saImmOiRtObjectCreate_2 failed with error = 14
Sep  7 17:21:12 sofo-s2 osafckptd[2284]: ER create runtime ckpt object failed 
with error: 14
Sep  7 17:21:12 sofo-s2 osafckptd[2284]: ER cpd db add ckpt_node failed for 
ckpt_id:2


4. After some time cpktd seg faulted on active controller
>>
Sep  7 17:21:43 sofo-s2 osafamfnd[2187]: NO 
'safComp=CPD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Sep  7 17:21:43 sofo-s2 osafamfnd[2187]: ER 
safComp=CPD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Sep  7 17:21:43 sofo-s2 osafamfnd[2187]: Rebooting OpenSAF NodeId = 131599 EE 
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131599, SupervisionTime = 60
Sep  7 17:21:43 sofo-s2 opensaf_reboot: Rebooting local node; timeout=60

5. Below is the bt

0-  0x7fbbd5ffcb20 in memcmp () from /lib64/libc.so.6
1-  0x7fbbd7a10929 in ncs_patricia_tree_get (pTree=0x67b4c8, 
pKey=0x7d22531c "\017\001\002") at patricia.c:435

2-  0x0040800d in cpd_cpnd_info_node_get (cpnd_tree=0x67b4c8, 
dest=0x67ec60, cpnd_info_node=0x7d225350) at cpd_db.c:706

3-  0x0040cd56 in cpd_evt_proc_mds_evt (cb=0x67b340, evt=0x67ec50) at 
cpd_evt.c:1378

4-  0x004091cb in cpd_process_evt (evt=0x67ec40) at cpd_evt.c:107
5-  0x0041185f in cpd_main_process (cb=0x67b340) at cpd_init.c:661
6 - 0x00411b89 in main (argc=1, argv=0x7d225578) at cpd_main.c:74


Notes:
1. Syslog attached
2. bt attached 
3. ckptd traces not enabled


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2010 IMM: library receives wrong response when a ccb is aborted

2016-09-08 Thread Hung Nguyen

- Description has changed:

Diff:



--- old
+++ new
@@ -3,7 +3,7 @@
 In some cases the client is not in a sync call (i.e. not waiting for response) 
but IMMND still sends that response to the client. One example is when the OI 
attaches/deattaches. That may cause the client to receive unexpected response 
if the client at that time calls an sync IMM api.
 
 Details of the problem is explained here
-[Click me 
!!!](http://sequencediagram.org/index.html?initialData=A4QwTgLglgxloDsIAICSBZdBBAUKSs8ISamAcgCJ7jRyIobpU5gBGA9gB7LsBuApmFJMAXFmQwYrALQgOkZFADOyAOb8EgkBH4ATADoIA7gAsNyAPKpk2iCBgmoCVQHpd-W-cfOcORhWkAYlUwfg0APkZKEQoAJkoAfSwAIQsAJQAVBIBhbOTkAApAgEYASj9MLCDWABsAV35I8goxeIocvKSABS6AGQBNQsDY8pYObj5BYWioimRgMHYYfiUlFYkpWXkUAFslVWQAM0Wd4QpDYl1kNYRdFVClYHYENeQIdh4wKFUnbScDhDsdyGNb8RQ7HboIH8GoJSSsLDbAqjWZBUK6JrYESUWJYBKMBIAUTSaXSCViyCMUAgJmQADEsKheoT2hYusSsBlUBYyEMAMylQwAKhFGUcKmUbzMEmeawAjg0EMseIdkCVfGwuDwBEJGFgRHrkKFllABCoaWCHk8XmDLlKnABrc0mbSUkDOy0ra1rQyHdhCYYAGmQrDqKHsEDqIBqNQAnooIAByFQAKzqSnDMptCo0yvYqpKhkMhuN-FN6wtlMWziNXtlYNDKF0UF0CETKBgYHdtLMoUMrH4MBA6bBFh2uQRwGAceRNkk-GAEBUOLxBOJpLS5I1421Uz1IjH2SkWCnM9KtcjYBe9MZzNZXUMJ+nsD+zyz0AQDRUVJplh2WF0HYnAsIxNDAOlfhqKAAC9+GRXw9WqepGlmVpEiwCh0AsBI6VQMgsF6VAAC1CSGAAWUZNQmHVphaWZkEBIxrg0O4pU9R56yOf01ViBDmjRPRMX1Fd8UwIkSTJCkf1pBkmRZBI2Q5LkeX5QUEBFIUxUlSVKytTi-QDXixi1SZdUqA1KhsVQQCcWsTTNGxAQtIQjGrA49JtIsEFQDsuyUMxnR0qAdgbQdh1eMcAKAhAQLAiCEG
 jGC4LU3R2BWNtw3nRdkBEtcJM3XigA)
+http://sequencediagram.org/index.html?initialData=A4QwTgLglgxloDsIAICSBZdBBAUKSs8ISamAcgCJ7jRyIobpU5gBGA9gB7LsBuApmFJMAXFmQwYrALQgOkZFADOyAOb8EgkBH4ATADoIA7gAsNyAPKpk2iCBgmoCVQHpd-W-cfOcORhWkAYlUwfg0APkZKEQoAJkoAfSwAIQsAJQAVBIBhbOTkAApAgEYASj9MLCDWABsAV35I8goxeIocvKSABS6AGQBNQsDY8pYObj5BYWioimRgMHYYfiUlFYkpWXkUAFslVWQAM0Wd4QpDYl1kNYRdFVClYHYENeQIdh4wKFUnbScDhDsdyGNb8RQ7HboIH8GoJSSsLDbAqjWZBUK6JrYESUWJYBKMBIAUTSaXSCViyCMUAgJmQADEsKheoT2hYusSsBlUBYyEMAMylQwAKhFGUcKmUbzMEmeawAjg0EMseIdkCVfGwuDwBEJGFgRHrkKFllABCoaWCHk8XmDLlKnABrc0mbSUkDOy0ra1rQyHdhCYYAGmQrDqKHsEDqIBqNQAnooIAByFQAKzqSnDMptCo0yvYqpKhkMhuN-FN6wtlMWziNXtlYNDKF0UF0CETKBgYHdtLMoUMrH4MBA6bBFh2uQRwGAceRNkk-GAEBUOLxBOJpLS5I1421Uz1IjH2SkWCnM9KtcjYBe9MZzNZXUMJ+nsD+zyz0AQDRUVJplh2WF0HYnAsIxNDAOlfhqKAAC9+GRXw9WqepGlmVpEiwCh0AsBI6VQMgsF6VAAC1CSGAAWUZNQmHVphaWZkEBIxrg0O4pU9R56yOf01ViBDmjRPRMX1Fd8UwIkSTJCkf1pBkmRZBI2Q5LkeX5QUEBFIUxUlSVKytTi-QDXixi1SZdUqA1KhsVQQCcWsTTNGxAQtIQjGrA49JtIsEFQDsuyUMxnR0qAdgbQdh1eMcAKAhAQLAiCEGjGC4LU3R2BWNtw3
 nRdkBEtcJM3XigA
 
 ~~~
 09:45:58 SC-2-2 osafimmnd[3918]: NO ERR_TRY_AGAIN: ccb 1266 is active on 
object CmwSwMswMId=1 of class CmwSwMSwM. Can not add class implementer






---

** [tickets:#2010] IMM: library receives wrong response when a ccb is aborted**

**Status:** accepted
**Milestone:** 4.7.2
**Created:** Thu Sep 08, 2016 07:10 AM UTC by Hung Nguyen
**Last Updated:** Thu Sep 08, 2016 07:15 AM UTC
**Owner:** Hung Nguyen
**Attachments:**

- [logs.7z](https://sourceforge.net/p/opensaf/tickets/2010/attachment/logs.7z) 
(6.8 MB; application/octet-stream)


When receiving the ccb abort message (D2ND_ABORT_CCB) over fevs, IMMND will 
abort the message and send response to client if it's the originating node. See 
immnd_evt_proc_ccb_finalize().

In some cases the client is not in a sync call (i.e. not waiting for response) 
but IMMND still sends that response to the client. One example is when the OI 
attaches/deattaches. That may cause the client to receive unexpected response 
if the client at that time calls an sync IMM api.

Details of the problem is explained here
http://sequencediagram.org/index.html?initialData=A4QwTgLglgxloDsIAICSBZdBBAUKSs8ISamAcgCJ7jRyIobpU5gBGA9gB7LsBuApmFJMAXFmQwYrALQgOkZFADOyAOb8EgkBH4ATADoIA7gAsNyAPKpk2iCBgmoCVQHpd-W-cfOcORhWkAYlUwfg0APkZKEQoAJkoAfSwAIQsAJQAVBIBhbOTkAApAgEYASj9MLCDWABsAV35I8goxeIocvKSABS6AGQBNQsDY8pYObj5BYWioimRgMHYYfiUlFYkpWXkUAFslVWQAM0Wd4QpDYl1kNYRdFVClYHYENeQIdh4wKFUnbScDhDsdyGNb8RQ7HboIH8GoJSSsLDbAqjWZBUK6JrYESUWJYBKMBIAUTSaXSCViyCMUAgJmQADEsKheoT2hYusSsBlUBYyEMAMylQwAKhFGUcKmUbzMEmeawAjg0EMseIdkCVfGwuDwBEJGFgRHrkKFllABCoaWCHk8XmDLlKnABrc0mbSUkDOy0ra1rQyHdhCYYAGmQrDqKHsEDqIBqNQAnooIAByFQAKzqSnDMptCo0yvYqpKhkMhuN-FN6wtlMWziNXtlYNDKF0UF0CETKBgYHdtLMoUMrH4MBA6bBFh2uQRwGAceRNkk-GAEBUOLxBOJpLS5I1421Uz1IjH2SkWCnM9KtcjYBe9MZzNZXUMJ+nsD+zyz0AQDRUVJplh2WF0HYnAsIxNDAOlfhqKAAC9+GRXw9WqepGlmVpEiwCh0AsBI6VQMgsF6VAAC1CSGAAWUZNQmHVphaWZkEBIxrg0O4pU9R56yOf01ViBDmjRPRMX1Fd8UwIkSTJCkf1pBkmRZBI2Q5LkeX5QUEBFIUxUlSVKytTi-QDXixi1SZdUqA1KhsVQQCcWsTTNGxAQtIQjGrA49JtIsEFQDsuyUMxnR0qAdgbQdh1eMcAKAhAQLAiCEGjGC4LU3R2BWNtw3n
 RdkBEtcJM3XigA

~~~
09:45:58 SC-2-2 osafimmnd[3918]: NO ERR_TRY_AGAIN: ccb 1266 is active on object 
CmwSwMswMId=1 of class CmwSwMSwM. Can not add class implementer
09:45:58 SC-2-2 osafimmnd[3918]: NO Trying to abort ccb 1266 to allow 
implementer CoreMwSwM to protect class CmwSwMSwM
09:45:58 SC-2-2 osafimmnd[3918]: NO implementer for class 'CmwIspConfig' is 
CmwIsp => class extent is safe.
09:45:58 SC-2-2 osafimmnd[3918]: NO Implementer disconnected 169 <0,

[tickets] [opensaf:tickets] #2010 IMM: library receives wrong response when a ccb is aborted

2016-09-08 Thread Hung Nguyen

- Description has changed:

Diff:



--- old
+++ new
@@ -3,7 +3,7 @@
 In some cases the client is not in a sync call (i.e. not waiting for response) 
but IMMND still sends that response to the client. One example is when the OI 
attaches/deattaches. That may cause the client to receive unexpected response 
if the client at that time calls an sync IMM api.
 
 Details of the problem is explained here
-[Click me 
!!!](http://sequencediagram.org/index.html?initialData=FABwhgTgLglgxjcA7KACAkgWUwQVJWBZNLTAOQBF9p5EwUNsrgIAjAewA9V2A3AUwiNMFAFw5UcOKwC0YDtFQwAzqgDm-JILBR+AEwA6SAO4ALTagDy6VDqhg4pmEjUB6PfzsOnL4MFIUMgDEahD8mgB8pJSiFABMlAD6OABClgBKACqJAMI5KagAFEEAjACU-tg4wawANgCu-FHYMTgJFLn5yQAK3QAyAJpFQXEVLBzcfILCMdEUqCAQ7HD8ysqrktJyCmgAtspqqABmS7vCFEb0eqjrSHqqYcog7EjrqFDsPBAwas46zockOwPEZ1vwlLtdphgfxaokpKwcDtCmM5sEwnpmrhRJQ4jhEqREgBRdLpDKJOKoYwwKCmVAAMRw6D6RI6lm6JJwmXQljIwwAzGUjAAqUWZJyqFTvcySF7rACOjSQKx4R1QpT8bC4PAEQlIOFE+tQYRWMAEqlp4Mez1e4Ku0ucAGsLaYdFSwC6rasbesjEd2EIRgAaVCsepoBxQepgWq1ACeSigAHJVAArerKCOy22KzQq9hq0pGIxGk38M0bS1UpYuY3euXgsNoPQwPRIJNoOAQD108xhIysfhwMAZ8GWXZ5REgEDxlG2KT8EBQVS4-GEklk9IUzUTHXTfWicc5aQ4aezsp1qMQV4MpkstndIynmfwf4vbOwJCNVTU2lWXY4HouzOJYxhaBA9J-LUMAAF78Cifj6jUDRNHM4jtMkFCYJYiT0ugZA4H06AAFpEsMAAsYxapMuozGIcyoECxg3Jo9zSl6TwNscAbqnEiEtIEQQYliBqrgS2DEqS5KUr+dKMsyrKJOynLcryApCkgorCuKUpSlW1pcf6gZ8eM2pTHqVSGlUthqGAzh1qa5q2EClpCMYNaHAZtrFkg6Cdt2yjmC6ekwLsjZDiObzjoBwFIKB4G
 QUgMawfBGl6OwqzthGC5LqgYnrlJW58UAA)
+[Click me 
!!!](http://sequencediagram.org/index.html?initialData=A4QwTgLglgxloDsIAICSBZdBBAUKSs8ISamAcgCJ7jRyIobpU5gBGA9gB7LsBuApmFJMAXFmQwYrALQgOkZFADOyAOb8EgkBH4ATADoIA7gAsNyAPKpk2iCBgmoCVQHpd-W-cfOcORhWkAYlUwfg0APkZKEQoAJkoAfSwAIQsAJQAVBIBhbOTkAApAgEYASj9MLCDWABsAV35I8goxeIocvKSABS6AGQBNQsDY8pYObj5BYWioimRgMHYYfiUlFYkpWXkUAFslVWQAM0Wd4QpDYl1kNYRdFVClYHYENeQIdh4wKFUnbScDhDsdyGNb8RQ7HboIH8GoJSSsLDbAqjWZBUK6JrYESUWJYBKMBIAUTSaXSCViyCMUAgJmQADEsKheoT2hYusSsBlUBYyEMAMylQwAKhFGUcKmUbzMEmeawAjg0EMseIdkCVfGwuDwBEJGFgRHrkKFllABCoaWCHk8XmDLlKnABrc0mbSUkDOy0ra1rQyHdhCYYAGmQrDqKHsEDqIBqNQAnooIAByFQAKzqSnDMptCo0yvYqpKhkMhuN-FN6wtlMWziNXtlYNDKF0UF0CETKBgYHdtLMoUMrH4MBA6bBFh2uQRwGAceRNkk-GAEBUOLxBOJpLS5I1421Uz1IjH2SkWCnM9KtcjYBe9MZzNZXUMJ+nsD+zyz0AQDRUVJplh2WF0HYnAsIxNDAOlfhqKAAC9+GRXw9WqepGlmVpEiwCh0AsBI6VQMgsF6VAAC1CSGAAWUZNQmHVphaWZkEBIxrg0O4pU9R56yOf01ViBDmjRPRMX1Fd8UwIkSTJCkf1pBkmRZBI2Q5LkeX5QUEBFIUxUlSVKytTi-QDXixi1SZdUqA1KhsVQQCcWsTTNGxAQtIQjGrA49JtIsEFQDsuyUMxnR0qAdgbQdh1eMcAKAhAQLAiCEG
 jGC4LU3R2BWNtw3nRdkBEtcJM3XigA)
 
 ~~~
 09:45:58 SC-2-2 osafimmnd[3918]: NO ERR_TRY_AGAIN: ccb 1266 is active on 
object CmwSwMswMId=1 of class CmwSwMSwM. Can not add class implementer






---

** [tickets:#2010] IMM: library receives wrong response when a ccb is aborted**

**Status:** accepted
**Milestone:** 4.7.2
**Created:** Thu Sep 08, 2016 07:10 AM UTC by Hung Nguyen
**Last Updated:** Thu Sep 08, 2016 07:10 AM UTC
**Owner:** Hung Nguyen
**Attachments:**

- [logs.7z](https://sourceforge.net/p/opensaf/tickets/2010/attachment/logs.7z) 
(6.8 MB; application/octet-stream)


When receiving the ccb abort message (D2ND_ABORT_CCB) over fevs, IMMND will 
abort the message and send response to client if it's the originating node. See 
immnd_evt_proc_ccb_finalize().

In some cases the client is not in a sync call (i.e. not waiting for response) 
but IMMND still sends that response to the client. One example is when the OI 
attaches/deattaches. That may cause the client to receive unexpected response 
if the client at that time calls an sync IMM api.

Details of the problem is explained here
[Click me 
!!!](http://sequencediagram.org/index.html?initialData=A4QwTgLglgxloDsIAICSBZdBBAUKSs8ISamAcgCJ7jRyIobpU5gBGA9gB7LsBuApmFJMAXFmQwYrALQgOkZFADOyAOb8EgkBH4ATADoIA7gAsNyAPKpk2iCBgmoCVQHpd-W-cfOcORhWkAYlUwfg0APkZKEQoAJkoAfSwAIQsAJQAVBIBhbOTkAApAgEYASj9MLCDWABsAV35I8goxeIocvKSABS6AGQBNQsDY8pYObj5BYWioimRgMHYYfiUlFYkpWXkUAFslVWQAM0Wd4QpDYl1kNYRdFVClYHYENeQIdh4wKFUnbScDhDsdyGNb8RQ7HboIH8GoJSSsLDbAqjWZBUK6JrYESUWJYBKMBIAUTSaXSCViyCMUAgJmQADEsKheoT2hYusSsBlUBYyEMAMylQwAKhFGUcKmUbzMEmeawAjg0EMseIdkCVfGwuDwBEJGFgRHrkKFllABCoaWCHk8XmDLlKnABrc0mbSUkDOy0ra1rQyHdhCYYAGmQrDqKHsEDqIBqNQAnooIAByFQAKzqSnDMptCo0yvYqpKhkMhuN-FN6wtlMWziNXtlYNDKF0UF0CETKBgYHdtLMoUMrH4MBA6bBFh2uQRwGAceRNkk-GAEBUOLxBOJpLS5I1421Uz1IjH2SkWCnM9KtcjYBe9MZzNZXUMJ+nsD+zyz0AQDRUVJplh2WF0HYnAsIxNDAOlfhqKAAC9+GRXw9WqepGlmVpEiwCh0AsBI6VQMgsF6VAAC1CSGAAWUZNQmHVphaWZkEBIxrg0O4pU9R56yOf01ViBDmjRPRMX1Fd8UwIkSTJCkf1pBkmRZBI2Q5LkeX5QUEBFIUxUlSVKytTi-QDXixi1SZdUqA1KhsVQQCcWsTTNGxAQtIQjGrA49JtIsEFQDsuyUMxnR0qAdgbQdh1eMcAKAhAQLAiCEGj
 GC4LU3R2BWNtw3nRdkBEtcJM3XigA)

~~~
09:45:58 SC-2-2 osafimmnd[3918]: NO ERR_TRY_AGAIN: ccb 1266 is active on object 
CmwSwMswMId=1 of class CmwSwMSwM. Can not add class implementer
09:45:58 SC-2-2 osafimmnd[3918]: NO Trying to abort ccb 1266 to allow 
implementer CoreMwSwM to protect class CmwSwMSwM
09:45:58 SC-2-2 osafimmnd[3918]: NO implementer for class 'CmwIspConfig' is 
CmwIsp => class extent is safe.
09:45:58 SC-2-2 osafimmnd[3918]: NO

[tickets] [opensaf:tickets] #2010 IMM: library receives wrong response when a ccb is aborted

2016-09-08 Thread Hung Nguyen




---

** [tickets:#2010] IMM: library receives wrong response when a ccb is aborted**

**Status:** accepted
**Milestone:** 4.7.2
**Created:** Thu Sep 08, 2016 07:10 AM UTC by Hung Nguyen
**Last Updated:** Thu Sep 08, 2016 07:10 AM UTC
**Owner:** Hung Nguyen
**Attachments:**

- [logs.7z](https://sourceforge.net/p/opensaf/tickets/2010/attachment/logs.7z) 
(6.8 MB; application/octet-stream)


When receiving the ccb abort message (D2ND_ABORT_CCB) over fevs, IMMND will 
abort the message and send response to client if it's the originating node. See 
immnd_evt_proc_ccb_finalize().

In some cases the client is not in a sync call (i.e. not waiting for response) 
but IMMND still sends that response to the client. One example is when the OI 
attaches/deattaches. That may cause the client to receive unexpected response 
if the client at that time calls an sync IMM api.

Details of the problem is explained here
[Click me 
!!!](http://sequencediagram.org/index.html?initialData=FABwhgTgLglgxjcA7KACAkgWUwQVJWBZNLTAOQBF9p5EwUNsrgIAjAewA9V2A3AUwiNMFAFw5UcOKwC0YDtFQwAzqgDm-JILBR+AEwA6SAO4ALTagDy6VDqhg4pmEjUB6PfzsOnL4MFIUMgDEahD8mgB8pJSiFABMlAD6OABClgBKACqJAMI5KagAFEEAjACU-tg4wawANgCu-FHYMTgJFLn5yQAK3QAyAJpFQXEVLBzcfILCMdEUqCAQ7HD8ysqrktJyCmgAtspqqABmS7vCFEb0eqjrSHqqYcog7EjrqFDsPBAwas46zockOwPEZ1vwlLtdphgfxaokpKwcDtCmM5sEwnpmrhRJQ4jhEqREgBRdLpDKJOKoYwwKCmVAAMRw6D6RI6lm6JJwmXQljIwwAzGUjAAqUWZJyqFTvcySF7rACOjSQKx4R1QpT8bC4PAEQlIOFE+tQYRWMAEqlp4Mez1e4Ku0ucAGsLaYdFSwC6rasbesjEd2EIRgAaVCsepoBxQepgWq1ACeSigAHJVAArerKCOy22KzQq9hq0pGIxGk38M0bS1UpYuY3euXgsNoPQwPRIJNoOAQD108xhIysfhwMAZ8GWXZ5REgEDxlG2KT8EBQVS4-GEklk9IUzUTHXTfWicc5aQ4aezsp1qMQV4MpkstndIynmfwf4vbOwJCNVTU2lWXY4HouzOJYxhaBA9J-LUMAAF78Cifj6jUDRNHM4jtMkFCYJYiT0ugZA4H06AAFpEsMAAsYxapMuozGIcyoECxg3Jo9zSl6TwNscAbqnEiEtIEQQYliBqrgS2DEqS5KUr+dKMsyrKJOynLcryApCkgorCuKUpSlW1pcf6gZ8eM2pTHqVSGlUthqGAzh1qa5q2EClpCMYNaHAZtrFkg6Cdt2yjmC6ekwLsjZDiObzjoBwFIKB4GQ
 UgMawfBGl6OwqzthGC5LqgYnrlJW58UAA)

~~~
09:45:58 SC-2-2 osafimmnd[3918]: NO ERR_TRY_AGAIN: ccb 1266 is active on object 
CmwSwMswMId=1 of class CmwSwMSwM. Can not add class implementer
09:45:58 SC-2-2 osafimmnd[3918]: NO Trying to abort ccb 1266 to allow 
implementer CoreMwSwM to protect class CmwSwMSwM
09:45:58 SC-2-2 osafimmnd[3918]: NO implementer for class 'CmwIspConfig' is 
CmwIsp => class extent is safe.
09:45:58 SC-2-2 osafimmnd[3918]: NO Implementer disconnected 169 <0, 2010f> 
(@ClusMonEE)
09:45:58 SC-2-2 osafimmnd[3918]: NO Ccb 1266 ABORTED 
(CoreMwEcimSwMBackgroundThread)
09:45:58 SC-2-2 ecimswm: ImmUtils::doImmOperations:saImmOmCcbApply failed 
SaAisErrorT=21
09:45:58 SC-2-2 ecimswm: EcimSwmAsyncImmOperation::main() failed with rc = 
21(SA_AIS_ERR_FAILED_OPERATION)
09:45:58 SC-2-2 ecimswm: imma_om_api.c:8769: saImmOmAdminOwnerFinalize: 
Assertion 'out_evt->info.imma.type == IMMA_EVT_ND2A_IMM_ERROR' failed.
09:45:58 SC-2-2 osafimmnd[3918]: NO Implementer connected: 173 (ClusMonEE) <0, 
2010f>
09:45:58 SC-2-2 osafimmnd[3918]: WA >>s_info->to_svc == 0<< reply context 
destroyed before this reply could be made
09:45:58 SC-2-2 osafimmnd[3918]: WA Failed to send response to agent/client 
over MDS
~~~

Attached is syslog and IMM traces


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2006 NTFSv: Cluster rebooted with ntfd crashed on both controllers

2016-09-08 Thread Vu Minh Nguyen

AIS states`additionalText` and `lengthAdditionalText` must be consistent.
Need to add an check of this. Return INVALID_PARAM if there is a mismatch.


---

** [tickets:#2006] NTFSv: Cluster rebooted with ntfd crashed on both 
controllers**

**Status:** accepted
**Milestone:** 5.1.RC1
**Created:** Wed Sep 07, 2016 06:42 AM UTC by Chani Srivastava
**Last Updated:** Wed Sep 07, 2016 08:22 AM UTC
**Owner:** Vu Minh Nguyen
**Attachments:**

- 
[NtfCrash.zip](https://sourceforge.net/p/opensaf/tickets/2006/attachment/NtfCrash.zip)
 (165.9 kB; application/zip)


OS : Suse PPC 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
no PBE )

Ntfd traces and syslog for both controllers attached
*  Ntf Application is running on system
*  Will update ticket with core dump 

Note: The timings on system are not synced. After every reboot node timings are 
modified

BT:
0  0x0fffa0848100 in .raise () from /lib64/libc.so.6
1  0x0fffa0849d10 in .abort () from /lib64/libc.so.6
2  0x0fffa0e34234 in osaf_abort (i_cause=7) at osaf_utility.c:27
3  0x1001a2f8 in NtfLogger::logNotification (this=0x100ba768, notif=
std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at NtfLogger.cc:247
4  0x10019e60 in NtfLogger::checkQueueAndLog (this=0x100ba768,
newNotif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at 
NtfLogger.cc:181
5  0x10019a74 in NtfLogger::log (this=0x100ba768, 
notif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0,
isLocal=true) at NtfLogger.cc:137
6  0x1002b528 in NtfAdmin::processNotification (this=0x100ba760, 
clientId=62, notificationType=SA_NTF_TYPE_ALARM,
sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc, notificationId=47) at 
NtfAdmin.cc:203
7  0x1002b938 in NtfAdmin::notificationReceived (this=0x100ba760, 
clientId=62, notificationType=SA_NTF_TYPE_ALARM,
sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc) at NtfAdmin.cc:257
8  0x1002ec20 in notificationReceived (clientId=62, 
notificationType=SA_NTF_TYPE_ALARM, sendNotInfo=0x100b8800,
mdsCtxt=0x100bb3dc) at NtfAdmin.cc:1012
9  0x10006410 in proc_send_not_msg (cb=0x10073190 <_ntfs_cb>, 
evt=0x100bb3d0) at ntfs_evt.c:447
10 0x10006b28 in process_api_evt (evt=0x100bb3d0) at ntfs_evt.c:628
11 0x10006c38 in ntfs_process_mbx (mbx=0x10073190 <_ntfs_cb>) at 
ntfs_evt.c:660
12 0x1000b6f4 in main (argc=2, argv=0xfffc056a7f8) at ntfs_main.c:399


Active Controler:
May 26 19:41:16 linux-pvra osafntfd[24205]: **osaf_abort(7) called from 
0x1001a2f8 with errno=11**
May 26 19:41:16 linux-pvra osafamfnd[24243]: NO 
'safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
May 26 19:41:16 linux-pvra osafamfnd[24243]: ER 
safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
May 26 19:41:16 linux-pvra osafamfnd[24243]: Rebooting OpenSAF NodeId = 131343 
EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131343, SupervisionTime = 60
May 26 19:41:16 linux-pvra opensaf_reboot: Rebooting local node; timeout=60
Jun  2 14:11:28 linux-pvra syslog-ng[1639]: syslog-ng starting up; 
version='2.0.9'


Ntf Trace:
May 26 19:41:16.767426 osafntfd [24205:lga_api.c:1190] TR logBufSize > 
strlen(logBuf) + 1
May 26 19:41:16.767436 osafntfd [24205:lga_api.c:1320] << saLogWriteLogAsync
Jun  2 14:11:47.153831 osafntfd [2958:ntfs_main.c:0181] >> initialize
Jun  2 14:11:47.175099 osafntfd [2958:ncs_main_pub.c:0220] TR
NCS:PROCESS_ID=2958



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1994 IMMSv: Finalized CCB are counted under Max Ccb Limit

2016-09-08 Thread Neelakanta Reddy

- **status**: unassigned --> accepted
- **assigned_to**: Neelakanta Reddy
- **Part**: - --> nd
- **Milestone**: 4.7.2 --> 5.1.RC1
- **Comment**:

The limit is considered only for active ccbs



---

** [tickets:#1994] IMMSv: Finalized CCB are counted under Max Ccb Limit**

**Status:** accepted
**Milestone:** 5.1.RC1
**Created:** Thu Sep 01, 2016 12:32 PM UTC by Chani Srivastava
**Last Updated:** Thu Sep 01, 2016 12:49 PM UTC
**Owner:** Neelakanta Reddy


setup:
Version - OpenSAF 5.1.FC : changeset - 7997
4-Node cluster
1PBE with 30K objects

- Default maxCcb is configured to 1 as in object 
opensafImm=opensafImm,safApp=safImmService
- Try creating more than 1 Ccb operations
~~~
for (( i = 1 ; i <=2; i++))
   immcfg -c TestClass testClass=$i 
~~~
Above operation fails with ERR_NO_RESOURCE after the Ccb count for cluster 
reached 1. Even when a max limit is reached; after few minutes more Ccbs 
are allowed. See the below syslog snippet



Sep  1 14:58:35 OSAF-SC1 osafimmnd[27298]: NO Ccb 45008 COMMITTED 
(chaniTestClass)
Sep  1 14:58:35 OSAF-SC1 osafimmnd[27298]: NO Ccb 45009 COMMITTED 
(chaniTestClass)
Sep  1 14:58:35 OSAF-SC1 osafimmnd[27298]: NO Ccb 45010 COMMITTED 
(chaniTestClass)
Sep  1 14:58:35 OSAF-SC1 osafimmnd[27298]: NO Ccb 45011 COMMITTED 
(chaniTestClass)
Sep  1 14:58:35 OSAF-SC1 osafimmnd[27298]: NO Ccb 45012 COMMITTED 
(chaniTestClass)
**Sep  1 *14:58:35* OSAF-SC1 osafimmnd[27298]: *NO ERR_NO_RESOURCES: maximum 
Ccbs limit 2 has been reached for the cluster***
Sep  1 15:00:34 OSAF-SC1 syslog-ng[1194]: Log statistics; 
dropped='pipe(/dev/xconsole)=0', dropped='pipe(/dev/tty10)=0', 
processed='center(queued)=92951', processed='center(received)=47084', 
processed='destination(messages)=47077', processed='destination(mailinfo)=7', 
processed='destination(mailwarn)=0', 
processed='destination(localmessages)=45786', 
processed='destination(newserr)=0', processed='destination(mailerr)=0', 
processed='destination(netmgm)=0', processed='destination(warn)=42', 
processed='destination(console)=16', processed='destination(null)=0', 
processed='destination(mail)=7', processed='destination(xconsole)=16', 
processed='destination(firewall)=0', processed='destination(acpid)=0', 
processed='destination(newscrit)=0', processed='destination(newsnotice)=0', 
processed='source(src)=47084'
**Sep  1 *15:10:14 *OSAF-SC1 osafimmnd[27298]: *NO Ccb 45014 COMMITTED 
(chaniTestClass)***
Sep  1 15:10:14 OSAF-SC1 osafimmnd[27298]: NO Ccb 45015 COMMITTED 
(chaniTestClass)
Sep  1 15:10:14 OSAF-SC1 osafimmnd[27298]: NO Ccb 45016 COMMITTED 
(chaniTestClass)
Sep  1 15:10:14 OSAF-SC1 osafimmnd[27298]: NO Ccb 45017 COMMITTED 
(chaniTestClass)



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2009 AMF: App Si is moving to UNASSIGNED state after middleware failover

2016-09-08 Thread Srikanth R

-> In addition to the steps mentioned in the ticket, for the below operations 
following message is printed in syslog.



Sep  8 12:06:29 CONTROLLER-1 osafamfd[]: ER exec: create FAILED 12
Sep  8 12:06:35 CONTROLLER-1 osafamfd[]: ER exec: create FAILED 12
Sep  8 12:06:45 CONTROLLER-1 osafamfd[]: ER exec: create FAILED 12
Sep  8 12:06:55 CONTROLLER-1 osafamfd[]: ER exec: create FAILED 12


 Below are the steps.
 
 -> Delete all the application objects.
 -> Perform the middleware switchover / failover. 
 -> New active controller is trying to access the application SI object which 
is already deleted earlier.
 
 
 Sep  8 12:08:36.647738 osafamfd [:main.cc:0810] << process_event
Sep  8 12:08:36.647743 osafamfd [:imm.cc:0396] >> execute
Sep  8 12:08:36.647748 osafamfd [:imm.cc:0142] >> exec: Create 
safCsi=CSI1,safSi=TestApp_SI4,safApp=TestApp_TwoN
Sep  8 12:08:36.647754 osafamfd [:imma_oi_api.c:2786] >> 
rt_object_create_common
Sep  8 12:08:36.647761 osafamfd [:imma_oi_api.c:2892] TR attr:safCSIComp
Sep  8 12:08:36.647768 osafamfd [:imma_oi_api.c:2892] TR 
attr:saAmfCSICompHAState
Sep  8 12:08:36.647795 osafamfd [:imma_oi_api.c:2892] TR 
attr:saAmfCSICompHAReadinessState
Sep  8 12:08:36.650289 osafamfd [:imma_oi_api.c:3063] << 
rt_object_create_common
Sep  8 12:08:36.650330 osafamfd [:imm.cc:0163] ER exec: create FAILED 12



---

** [tickets:#2009] AMF: App Si is moving to UNASSIGNED state after middleware 
failover**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Thu Sep 08, 2016 06:07 AM UTC by Srikanth R
**Last Updated:** Thu Sep 08, 2016 06:09 AM UTC
**Owner:** nobody


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature enabled & 
no PBE )
AMF Application : 2N model with SUs mapped on PL-3,PL-4  ( si-si deps enabled)


Summary :
--
Application SIs are moving to UNASSIGNED state after middleware failover.


Steps followed & Observed behaviour
--
 -> Initially brought up AMF application (2n model) on two payloads.
 -> All the SIs are fully assigned state and SUs are in INSERVICE state.
 -> Performed middleware failover.
 -> After standby became active controller, SIs moved to unassigned state. But 
'amf-state siass' is showing proper output.
 -> Application received CSI remove callbacks after locking the SUs


Expected behaviour
--
-> As no fault happened on the application, SIs should not move to UNASSIGNED 
state for middleware failover.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2009 AMF: App Si is moving to UNASSIGNED state after middleware failover

2016-09-08 Thread Srikanth R

amfd traces on both the controllers


Attachments:

- 
[2009.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/98b72c10/7108/attachment/2009.tgz)
 (849.1 kB; application/x-compressed-tar)


---

** [tickets:#2009] AMF: App Si is moving to UNASSIGNED state after middleware 
failover**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Thu Sep 08, 2016 06:07 AM UTC by Srikanth R
**Last Updated:** Thu Sep 08, 2016 06:07 AM UTC
**Owner:** nobody


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature enabled & 
no PBE )
AMF Application : 2N model with SUs mapped on PL-3,PL-4  ( si-si deps enabled)


Summary :
--
Application SIs are moving to UNASSIGNED state after middleware failover.


Steps followed & Observed behaviour
--
 -> Initially brought up AMF application (2n model) on two payloads.
 -> All the SIs are fully assigned state and SUs are in INSERVICE state.
 -> Performed middleware failover.
 -> After standby became active controller, SIs moved to unassigned state. But 
'amf-state siass' is showing proper output.
 -> Application received CSI remove callbacks after locking the SUs


Expected behaviour
--
-> As no fault happened on the application, SIs should not move to UNASSIGNED 
state for middleware failover.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2009 AMF: App Si is moving to UNASSIGNED state after middleware failover

2016-09-08 Thread Srikanth R




---

** [tickets:#2009] AMF: App Si is moving to UNASSIGNED state after middleware 
failover**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Thu Sep 08, 2016 06:07 AM UTC by Srikanth R
**Last Updated:** Thu Sep 08, 2016 06:07 AM UTC
**Owner:** nobody


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature enabled & 
no PBE )
AMF Application : 2N model with SUs mapped on PL-3,PL-4  ( si-si deps enabled)


Summary :
--
Application SIs are moving to UNASSIGNED state after middleware failover.


Steps followed & Observed behaviour
--
 -> Initially brought up AMF application (2n model) on two payloads.
 -> All the SIs are fully assigned state and SUs are in INSERVICE state.
 -> Performed middleware failover.
 -> After standby became active controller, SIs moved to unassigned state. But 
'amf-state siass' is showing proper output.
 -> Application received CSI remove callbacks after locking the SUs


Expected behaviour
--
-> As no fault happened on the application, SIs should not move to UNASSIGNED 
state for middleware failover.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2007 EVT: Service got hanged for 2 hours after saEvtEventPublish

[tickets] [opensaf:tickets] #2014 Rebooted controller not detected in TCP

[tickets] [opensaf:tickets] #2014 Rebooted controller not detected in TCP

[tickets] [opensaf:tickets] #2008 AMFND: Coredump while shutting down

[tickets] [opensaf:tickets] #2008 AMFND: Coredump while shutting down

[tickets] [opensaf:tickets] #2014 Rebooted controller not detected in TCP

[tickets] [opensaf:tickets] #2000 osaf: Cluster reset happend due to msgd crashed on both the controller

[tickets] [opensaf:tickets] #1954 log: assertion failed in log_stream_close

[tickets] [opensaf:tickets] #1969 smf: One step upgrade with cluster reboot does not wait for nodes to start

[tickets] [opensaf:tickets] #2013 IMM: Search Handle getting corrupt when saImmOmSearchNext_2() returns ERR_TIMEOUT

[tickets] [opensaf:tickets] #2008 AMFND: Coredump while shutting down

[tickets] [opensaf:tickets] #1985 log: cppcheck version 1.75 find errors in logsv

[tickets] [opensaf:tickets] #2006 NTFSv: Cluster rebooted with ntfd crashed on both controllers

[tickets] [opensaf:tickets] #2002 CLM : Agent crashed for invalid check in buffer notification parameter

[tickets] [opensaf:tickets] #2012 clm: inconsistant additionalText and lengthAdditionalText in notification construction

[tickets] [opensaf:tickets] #2008 AMFND: Coredump while shutting down

[tickets] [opensaf:tickets] #2012 clm: inconsistant additionalText and lengthAdditionalText in notification construction

[tickets] [opensaf:tickets] #2012 clm: inconsistant additionalText and lengthAdditionalText in construct notification

[tickets] [opensaf:tickets] #1995 AMF : amfd crashed while dumping AMF state

[tickets] [opensaf:tickets] #1970 imm: immoitest testsuite 4 fails when CCB takes more than 2 seconds to commit

[tickets] [opensaf:tickets] #2003 amf: SG unstable when SU moves to TERM_FAILED state during fresh assignments.

[tickets] [opensaf:tickets] #1973 imm: IMM test returns zero even when it fails

[tickets] [opensaf:tickets] #2011 ckptd seg faulted on active controller when trying to create checkpoint

[tickets] [opensaf:tickets] #2010 IMM: library receives wrong response when a ccb is aborted

[tickets] [opensaf:tickets] #2010 IMM: library receives wrong response when a ccb is aborted

[tickets] [opensaf:tickets] #2010 IMM: library receives wrong response when a ccb is aborted

[tickets] [opensaf:tickets] #2006 NTFSv: Cluster rebooted with ntfd crashed on both controllers

[tickets] [opensaf:tickets] #1994 IMMSv: Finalized CCB are counted under Max Ccb Limit

[tickets] [opensaf:tickets] #2009 AMF: App Si is moving to UNASSIGNED state after middleware failover

[tickets] [opensaf:tickets] #2009 AMF: App Si is moving to UNASSIGNED state after middleware failover

[tickets] [opensaf:tickets] #2009 AMF: App Si is moving to UNASSIGNED state after middleware failover

31 matches

Site Navigation

Mail list logo

Footer information