[tickets] [opensaf:tickets] #2000 msg: Cluster reset happend due to msgd crashed on both the controller

2016-09-06 Thread Ritu Raj
I attahced the bt and msgd trace file, below is the snippet of bt:

2  0x7f44089ef197 in __osafassert_fail (__file=0x7f4408a41987 
"osaf_extended_name.c", __line=139, __func=0x7f4408a419f0 <__FUNCTION__.2883> 
"osaf_extended_name_length",
__assertion=0x7f4408a41960 "length < SA_MAX_UNEXTENDED_NAME_LENGTH") at 
sysf_def.c:281
#3  0x7f44089ead1e in osaf_extended_name_length (name=0x67a72c) at 
osaf_extended_name.c:139
#4  0x7f44089fe7ff in osaf_encode_sanamet (ub=0x7fff9f4f09d0, 
name=0x67a72c) at hj_enc.c:403
#5  0x7f44089eb275 in ncs_edp_sanamet (hdl=0x6654c0, edu_tkn=0x0, 
ptr=0x67a72c, ptr_data_len=0x7fff9f4eee14, buf_env=0x7fff9f4f0130, 
op=EDP_OP_TYPE_ENC, o_err=0x7fff9f4f0238) at saf_edu.c:62
#6  0x7f44089f8ca1 in ncs_edu_run_edp (edu_hdl=0x6654c0, edu_tkn=0x0, 
rule=0x7fff9f4ef190, edp=0x404f40 , ptr=0x67a72c, 
dcnt=0x7fff9f4eee14, buf_env=0x7fff9f4f0130,
optype=EDP_OP_TYPE_ENC, o_err=0x7fff9f4f0238) at hj_edu.c:499
#7  0x7f44089f99b2 in ncs_edu_prfm_enc_on_non_ptr (edu_hdl=0x6654c0, 
edu_tkn=0x0, hdl_node=0x0, rule=0x7fff9f4ef190, ptr=0x67a72c, 
ptr_data_len=0x7fff9f4ef364, buf_env=0x7fff9f4f0130, o_err=0x7fff9f4f0238)
at hj_edu.c:972
#8  0x7f44089f9302 in ncs_edu_perform_exec_action_on_non_ptr 
(edu_hdl=0x6654c0, edu_tkn=0x0, hdl_node=0x0, rule=0x7fff9f4ef190, 
optype=EDP_OP_TYPE_ENC, ptr=0x67a72c, ptr_data_len=0x7fff9f4ef364,
buf_env=0x7fff9f4f0130, o_err=0x7fff9f4f0238) at hj_edu.c:805
#9  0x7f44089f92a0 in ncs_edu_perform_exec_action (edu_hdl=0x6654c0, 
edu_tkn=0x0, hdl_node=0x0, rule=0x7fff9f4ef190, optype=EDP_OP_TYPE_ENC, 
ptr=0x67a72c, ptr_data_len=0x7fff9f4ef364,
buf_env=0x7fff9f4f0130, o_err=0x7fff9f4f0238) at hj_edu.c:780
#10 0x7f44089f9041 in ncs_edu_exec_rule (edu_hdl=0x6654c0, edu_tkn=0x0, 
hdl_node=0x0, rule=0x7fff9f4ef190, ptr=0x67a72c, ptr_data_len=0x7fff9f4ef364, 
buf_env=0x7fff9f4f0130, optype=EDP_OP_TYPE_ENC,
o_err=0x7fff9f4f0238) at hj_edu.c:627
#11 0x7f44089fa8db in ncs_edu_run_rules_for_enc (edu_hdl=0x6654c0, 
edu_tkn=0x0, hdl_node=0x0, prog=0x7fff9f4ef150, ptr=0x67a72c, 
ptr_data_len=0x7fff9f4ef364, buf_env=0x7fff9f4f0130, o_err=0x7fff9f4f0238,
instr_count=4) at hj_edu.c:1666


Attachments:

- 
[bt_msgd.tar](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/1ff9fd44/ec64/attachment/bt_msgd.tar)
 (20.5 kB; application/x-tar)
- 
[osafmsgd](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/1ff9fd44/ec64/attachment/osafmsgd)
 (280.6 kB; application/octet-stream)


---

** [tickets:#2000] msg: Cluster reset happend due to msgd crashed on both the 
controller**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Tue Sep 06, 2016 06:04 AM UTC by Ritu Raj
**Last Updated:** Tue Sep 06, 2016 06:04 AM UTC
**Owner:** nobody
**Attachments:**

- 
[Active_syslog](https://sourceforge.net/p/opensaf/tickets/2000/attachment/Active_syslog)
 (716.7 kB; application/octet-stream)
- 
[Standby_syslog](https://sourceforge.net/p/opensaf/tickets/2000/attachment/Standby_syslog)
 (696.4 kB; application/octet-stream)


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
1PBE enabled with 30K objects )

Summary :
--
Cluster reset happend due to assertion SA_MAX_UNEXTENDED_NAME_LENGTH failed in 
msgd

Steps followed & Observed behaviour
--
1.  Invoked failover 
2.  After, few successful failover, New Active Controller rebooted beacuse of 
Assertion 'length < SA_MAX_UNEXTENDED_NAME_LENGTH' failed in msgd. While 
previous Active joinig the cluster as a Standby Role resulted cluster reset 
happend. 
[Timeline: Sep  6 00:13:02 sofo-s2]

Sep  6 00:13:02 sofo-s2 osafimmd[3985]: NO MDS event from svc_id 24 (change:5, 
dest:13)
Sep  6 00:13:02 sofo-s2 osafmsgd[4145]: osaf_extended_name.c:139: 
osaf_extended_name_length: Assertion 'length < SA_MAX_UNEXTENDED_NAME_LENGTH' 
failed.
Sep  6 00:13:02 sofo-s2 osafamfnd[4046]: NO 
'safComp=MQD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Sep  6 00:13:02 sofo-s2 osafamfnd[4046]: ER 
safComp=MQD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Sep  6 00:13:02 sofo-s2 osafamfnd[4046]: Rebooting OpenSAF NodeId = 131599 EE 
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131599, SupervisionTime = 60
Sep  6 00:13:02 sofo-s2 opensaf_reboot: Rebooting local node; timeout=60

Notes:
1. Syslog attached
2  msgnd & msgd  trace not enabled


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing 

[tickets] [opensaf:tickets] #1999 osafntfd on active controller crashed while logging to alarm stream

2016-09-06 Thread A V Mahesh (AVM)
- **status**: unassigned --> accepted
- **assigned_to**: A V Mahesh (AVM)
- **Component**: ntf --> log
- **Milestone**: 4.7.2 --> 5.1.RC1
- **Comment**:

Even linking with New agents  A.2.2 code , if  client saLogInitialize with  
A.2.1 ,
CLM status should be ignored .





---

** [tickets:#1999] osafntfd on active controller crashed while logging to alarm 
stream**

**Status:** accepted
**Milestone:** 5.1.RC1
**Created:** Tue Sep 06, 2016 05:15 AM UTC by Srikanth R
**Last Updated:** Tue Sep 06, 2016 09:34 AM UTC
**Owner:** A V Mahesh (AVM)


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature disabled & 
no PBE )
AMF Application : 2N model with SUs mapped on PL-3,PL-4

Summary :
--
NTFD crashed on active controller, while logging notification to alarm stream.


Steps followed & Observed behaviour
--
 -> Initially performed couple of switchovers and tests on AMF application.
 -> Performed CLM lock operation of standby SC-1 and later unlocked.
 -> Performed switchover such that SC-1 became active controller.
 -> Stopped opensafd on PL-4. NTFD on active controller crashed.
 
Sep  6 10:18:25 CONTROLLER-1 osafamfd[2262]: NO Node 'PL-4' left the cluster
..
Sep  6 10:18:25 CONTROLLER-1 osafntfd[2242]: osaf_abort(31) called from 
0x414d1e with errno=11
Sep  6 10:18:25 CONTROLLER-1 osafamfnd[2272]: NO 
'safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'

-> Below is the excerpt from the ntfd trace.

Sep  6 10:18:25.436394 osafntfd [2242:NtfAdmin.cc:0252] T2 New notification 
received, id: 682
Sep  6 10:18:25.436398 osafntfd [2242:NtfAdmin.cc:0187] >> processNotification
Sep  6 10:18:25.436404 osafntfd [2242:NtfNotification.cc:0045] T3 constructor 
0x685790, notId: 682
Sep  6 10:18:25.436409 osafntfd [2242:ntfsv_mem.c:0761] >> ntfsv_get_ntf_header
Sep  6 10:18:25.436412 osafntfd [2242:ntfsv_mem.c:0782] << ntfsv_get_ntf_header
Sep  6 10:18:25.436425 osafntfd [2242:NtfAdmin.cc:0200] T2 notification 682 
with type 16384 added, notificationMap size is 1
Sep  6 10:18:25.436431 osafntfd [2242:NtfLogger.cc:0130] >> log
Sep  6 10:18:25.436435 osafntfd [2242:NtfLogger.cc:0132] T2 notification Id=682 
received in logger with size 0
Sep  6 10:18:25.436439 osafntfd [2242:NtfLogger.cc:0135] T2 IS LOCAL, logging
Sep  6 10:18:25.436442 osafntfd [2242:NtfLogger.cc:0166] >> checkQueueAndLog
Sep  6 10:18:25.436447 osafntfd [2242:NtfLogger.cc:0196] >> logNotification
Sep  6 10:18:25.436452 osafntfd [2242:ntfsv_mem.c:0761] >> ntfsv_get_ntf_header
Sep  6 10:18:25.436455 osafntfd [2242:ntfsv_mem.c:0782] << ntfsv_get_ntf_header
Sep  6 10:18:25.436460 osafntfd [2242:NtfLogger.cc:0231] T2 Logging 
notification to alarm stream
Sep  6 10:18:25.436495 osafntfd [2242:lga_api.c:1151] >> saLogWriteLogAsync
Sep  6 10:18:25.436500 osafntfd [2242:lga_api.c:1015] >> handle_log_record
Sep  6 10:18:25.436507 osafntfd [2242:lga_api.c:1110] << handle_log_record
Sep  6 10:18:25.436518 osafntfd [2242:lga_api.c:1229] TR **saLogWriteLogAsync 
Node not CLM member or stale client**
Sep  6 10:18:25.436524 osafntfd [2242:lga_api.c:1320] << saLogWriteLogAsync
Sep  6 10:18:42.472616 osafntfd [2176:ntfs_main.c:0181] >> initialize




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1993 amf: amfnd crashes during su lock if CSI attribute name or value is a long dn.

2016-09-06 Thread Praveen
- **Milestone**: future --> 5.1.RC1



---

** [tickets:#1993] amf: amfnd crashes during su lock if CSI attribute name or 
value is a long dn.**

**Status:** accepted
**Milestone:** 5.1.RC1
**Created:** Thu Sep 01, 2016 11:09 AM UTC by Praveen
**Last Updated:** Thu Sep 01, 2016 11:10 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[amfnd_crash.tgz](https://sourceforge.net/p/opensaf/tickets/1993/attachment/amfnd_crash.tgz)
 (69.4 kB; application/x-compressed)


Configuration:
In the long dn amf demo, add csi attribute for the CSI keeping attribute value 
a longdn.
1)Bring the configuration up.
2)Lock the SU.
3)AMFND crashes.

AMFND uses memcpy() and thus works with orignal csi attribute values from 
csi_rec.
It frees the memory in avsv_amf_cbk_free() when CSI_SET callback arrives. 
During SU lock, it agian tries to free the memory while deleting the record.
At AMFND and AMFD, all SaNameT handling should be done using 
osaf_extended_name_alloc() API.

Issue will be applicable in case of messages related to CSI Attribute change 
callback also.







---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1788 cpsv: saCkptCheckpointWrite() returns SA_AIS_ERR_NOT_EXIST after headless state

2016-09-06 Thread A V Mahesh (AVM)
- **status**: review --> fixed
- **Comment**:

changeset:   8011:6accddff2419
parent:  8007:661036525753
user:Hoang Vo 
date:Wed Sep 07 09:18:58 2016 +0530
summary: cpd: To reduce updating time out [#1788]
 
changeset:   8012:723f2cdad674
branch:  opensaf-5.1.x
parent:  8008:ba9a421fbacf
user:Hoang Vo 
date:Wed Sep 07 09:19:28 2016 +0530
summary: cpd: To reduce updating time out [#1788]
 
changeset:   8013:260bf6c3a621
branch:  opensaf-5.0.x
tag: tip
parent:  8009:a2713c3caf11
user:Hoang Vo 
date:Wed Sep 07 09:19:52 2016 +0530
summary: cpd: To reduce updating time out [#1788]



---

** [tickets:#1788] cpsv: saCkptCheckpointWrite() returns SA_AIS_ERR_NOT_EXIST 
after headless state**

**Status:** fixed
**Milestone:** 5.0.1
**Created:** Thu Apr 28, 2016 02:20 AM UTC by Pham Hoang Nhat
**Last Updated:** Fri May 13, 2016 02:12 AM UTC
**Owner:** Pham Hoang Nhat


The problem happened in the following scenario:

1. Application calls saCkptCheckpointOpen() to create a collocated checkpoint 
on SC-2. Replica of the checkpoint on SC-2 is active
2. Application calls saCkptCheckpointOpen() to open a collocated checkpoint on 
PL-5.
3. Application creates section and accesses the checkpoint on PL-5.
4. Both SCs are down.
5. Both SCs are up again.
6. Application accesses the checkpoint with saCkptCheckpointWrite(). The fault 
code SA_AIS_ERR_NOT_EXIST is return.

This problem happened because the osafckptnd process ID on SC-2 before headless 
and after headless are same. This leads their MDS destination are same. Thus 
when the SC-2 is up and in short time when CPD hadn't been assigned a new 
active replica, the application send checkpoint access request to CPND on SC-2 
which no longer hosts the active replica. Then it returns SA_AIS_ERR_NOT_EXIST.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1670 cpsv: Checkpoint is destroyed althought there is a user using it

2016-09-06 Thread A V Mahesh (AVM)
- **status**: review --> fixed
- **Comment**:

changeset:   8007:661036525753
parent:  8005:36f63cf5aa4d
user:Hoang Vo 
date:Wed Sep 07 09:13:01 2016 +0530
summary: cpsv: CPD starts retention duration timer if the checkpoint is no 
longer used [#1670]
 
changeset:   8008:ba9a421fbacf
branch:  opensaf-5.1.x
parent:  8006:f8bc9f897235
user:Hoang Vo 
date:Wed Sep 07 09:13:25 2016 +0530
summary: cpsv: CPD starts retention duration timer if the checkpoint is no 
longer used [#1670]
 
changeset:   8009:a2713c3caf11
branch:  opensaf-5.0.x
parent:  8001:3e43cfb7d74f
user:Hoang Vo 
date:Wed Sep 07 09:13:42 2016 +0530
summary: cpsv: CPD starts retention duration timer if the checkpoint is no 
longer used [#1670]
 
changeset:   8010:1c50d7f77c2a
branch:  opensaf-4.7.x
tag: tip
parent:  8002:6b58ec847a47
user:Hoang Vo 
date:Wed Sep 07 09:14:00 2016 +0530
summary: cpsv: CPD starts retention duration timer if the checkpoint is no 
longer used [#1670]



---

** [tickets:#1670] cpsv: Checkpoint is destroyed althought there is a user 
using it**

**Status:** fixed
**Milestone:** 4.7.2
**Created:** Fri Jan 22, 2016 04:09 AM UTC by Pham Hoang Nhat
**Last Updated:** Wed May 04, 2016 05:35 PM UTC
**Owner:** Pham Hoang Nhat


Problem description:

Checkpoint is destroyed althought there is a user using it.

Steps to reproduce the problems are:
1. Create a checkpoint on PL3 with flag (creation flag SA_CKPT_WR_ALL_REPLICAS 
and retention duration = 0)
2. Open this checkpoint on PL4
3. Restart PL3

After step 3. the checkpoint is destroyed although it was using on PL4.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1574 CKPT: Support DNs longer than 255 bytes

2016-09-06 Thread A V Mahesh (AVM)
Pushed to  http://hg.code.sf.net/p/opensaf/documentation

changeset:   186:319b3ffccdc0
tag: tip
user:Hoang Vo 
date:Wed Sep 07 08:58:08 2016 +0530
summary: cpsv: update PR document following Long DN extension [#1574]



---

** [tickets:#1574] CKPT: Support DNs longer than 255 bytes**

**Status:** fixed
**Milestone:** 5.1.FC
**Created:** Wed Oct 28, 2015 09:46 AM UTC by Pham Hoang Nhat
**Last Updated:** Tue Aug 23, 2016 10:15 AM UTC
**Owner:** Pham Hoang Nhat


Ticket [#191]  introduced generic support in OpenSAF for DNs longer than 255 
bytes. Each individual OpenSAF service will also have to be adapted to support 
long DNs. CKPT should have this feature. The applications may want to use long 
DNs for checkpoint name.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1962 libplms_hpi.so.0 loading issue - Opensaf with plms services

2016-09-06 Thread Alex Jones
Try the attached patch instead of yours. If you are building just from the 
released tar file instead of from source control, you will need to modify 
Makefile.in in the same directory too.



Attachments:

- 
[plm-1962.patch](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/06af13af/02af/attachment/plm-1962.patch)
 (368 Bytes; text/x-diff)


---

** [tickets:#1962] libplms_hpi.so.0 loading issue - Opensaf with plms services**

**Status:** unassigned
**Milestone:** 5.0.1
**Created:** Fri Aug 19, 2016 08:09 PM UTC by Subrata Nath
**Last Updated:** Mon Sep 05, 2016 02:27 AM UTC
**Owner:** nobody


Hello,

I have installed opensaf 5.0.0 without the PLMS fine for node HA purpose and 
it's works fine. Now I would like to use opensaf PLMS with the openHPI.  

My configure script is -

./configure CPPFLAGS=-DRUNASROOT OSAF_HARDEN_FLAGS="-fstack-protector-all 
-D_FORTIFY_SOURCE=2" HPI_LIBS="-L/usr/local/lib -lopenhpimarshal -lopenhpiutils 
-lopenhpi" --enable-hpi --with-openhpi --with-hpi-interface=B03 
--enable-tipc=yes --enable-imm-pbe=yes --enable-ais-plm --enable-ais-smf 
--enable-ais-msg --enable-ais-lck --enable-ais-evt --enable-ais-ckpt 
--enable-ntf-imcn 

Issues found dueing opensaf start up is -following error message is seen -

" ER dlopen() to load libplms_hpi.so failed with error 
/usr/lib64/opensaf/libplms_hpi.so.0: undefined symbol: plms_plmc_error_cbk" 

during the .so loading. Is this issue seen before or some configuration issue 
from my end. As per my understadning this is due to make file issue.

Temporarily this issue, i could fix by copying the following three methods 
)with method name changing) from 
opensaf-5.0.0/osaf/services/saf/plmsv/plms/plms_plmc.c to 
opensaf-5.0.0/osaf/services/saf/plmsv/plms/hpi_intf/plms_hsm.c

void plms_os_information_free(PLMS_PLMC_EE_OS_INFO *os_info)
static SaUint32T plms_os_information_parse(SaInt8T *os_info,
PLMS_PLMC_EE_OS_INFO *evt_os_info)
int32_t plms_plmc_error_callbk(plmc_lib_error *msg)
int32_t plms_plmc_connect_callbk(SaInt8T *ee_id,SaInt8T *msg)
int32_t plms_plmc_udp_callbk(udp_msg *msg)


Could you please check the make file for the  
opensaf-5.0.0/osaf/services/saf/plmsv/plms/hpi_intf/ where dependency is there 
outside of the folder also.

Regards,
Subrata


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2005 smfd: Inconsistent reading of settings

2016-09-06 Thread elunlen



---

** [tickets:#2005] smfd: Inconsistent reading of settings**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Tue Sep 06, 2016 02:35 PM UTC by elunlen
**Last Updated:** Tue Sep 06, 2016 02:35 PM UTC
**Owner:** nobody


SMF reads IMM settings and upfdate its cb globals when assigend active, in oi 
apply callback and after executed init actions in campaign init state. This 
gives a problem with strange behaviour regarding when IMM settings are updated.
Example:
Before executing a campaign that has a long campaign name (> 255 characters) 
longDnsAllowed and smfKeepDuState shall be changed before start executing the 
campaign.
1.
If smfKeepDuState is changed before longDnsAllowed the campaing will fail 
because cb globals are not updated after change of longDnsAllowed
2.
If longDnsAllowed is changed before smfKeepDuState is changed the campaign will 
succeed because cb will be update with the new longDnsAllowed setting when the 
OI apply callback is called when smfKeepDuState is changed


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2004 SMF: smfd got crashed when triggered campaign for application upgrade.

2016-09-06 Thread Madhurika Koppula
Attaching the smfd crash stacktrace as an attachment.
Gcc Version: 6.1.0.



Attachments:

- 
[smfd_crash.rtf](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/85e4f056/ee2c/attachment/smfd_crash.rtf)
 (10.1 kB; application/rtf)


---

** [tickets:#2004] SMF: smfd got crashed when triggered campaign for 
application upgrade.**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Tue Sep 06, 2016 08:35 AM UTC by Madhurika Koppula
**Last Updated:** Tue Sep 06, 2016 08:35 AM UTC
**Owner:** nobody
**Attachments:**

- [smf.tgz](https://sourceforge.net/p/opensaf/tickets/2004/attachment/smf.tgz) 
(1.6 MB; application/octet-stream)


**Environment Details:**

OS : Suse 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
1PBE enabled ).

**summary:**

smfd got crashed due to segfault on active controller.

**Steps followed & Observed behaviour:**

Test SGupgrade of 2N model with valid configurations.

**Observations:**

Active controller went for reboot due to avadown for smfd.

Below is the snippet of syslog on active controller:

Sep  6 11:52:19 SLES-M-SLOT-1 osafsmfd[3745]: NO 
SmfProcedureThread::getImmProcedure, IMM data for procedure 
safSmfProc=amfClusterProc-1,safSmfCampaign=Campaign2,safApp=safSmfService not 
found
Sep  6 11:52:19 SLES-M-SLOT-1 osafimmnd[3661]: NO Implementer connected: 20 
(safSmfProc1) <662, 2010f>
Sep  6 11:52:19 SLES-M-SLOT-1 osafsmfd[3745]: NO PROC: Start upgrade procedure 
safSmfProc=amfClusterProc-1
Sep  6 11:52:19 SLES-M-SLOT-1 osafsmfd[3745]: NO PROC: Start procedure init 
actions

Sep  6 11:52:19 SLES-M-SLOT-1 osafamfnd[3726]: NO 
'safComp=SMF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'

**Sep  6 11:52:19 SLES-M-SLOT-1 osafamfnd[3726]: ER 
safComp=SMF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast**

Sep  6 11:52:19 SLES-M-SLOT-1 osafamfnd[3726]: Rebooting OpenSAF NodeId = 
131343 EE Name = , Reason: Component faulted: recovery is node failfast, 
OwnNodeId = 131343, SupervisionTime = 60

Below is the snippet of osafsmfd trace on active controller:

Sep  6 11:52:19.808986 osafsmfd [3745:SmfUpgradeProcedure.cc:0741] TR 
SmfUpgradeProcedure::calculateRollingSteps:calculateRollingSteps new SW install 
step added safSmfStep=0003 (with no act/deact unit) for node 
safAmfNode=PL-4,safAmfCluster=myAmfCluster
Sep  6 11:52:19.808995 osafsmfd [3745:SmfUpgradeProcedure.cc:1876] >> 
addStepModifications
Sep  6 11:52:19.809002 osafsmfd [3745:SmfUpgradeProcedure.cc:1931] >> 
addStepModificationsNode
Sep  6 11:52:19.809008 osafsmfd [3745:imma_om_api.c:0160] >> saImmOmInitialize
Sep  6 11:52:19.809015 osafsmfd [3745:imma_om_api.c:0186] TR OM client version 
A.2.1
Sep  6 11:52:19.809021 osafsmfd [3745:imma_om_api.c:0228] >> initialize_common
Sep  6 11:52:19.809026 osafsmfd [3745:imma_init.c:0275] >> imma_startup: use 
count 1
Sep  6 11:52:19.809032 osafsmfd [3745:imma_init.c:0298] << imma_startup: use 
count 2
Sep  6 11:52:19.809040 osafsmfd [3745:imma_om_api.c:0246] T2 IMMA library 
syncronous timeout set to:3
Sep  6 11:52:19.809263 osafsmfd [3745:imma_om_api.c:0349] T1 Trying to add OM 
client id:727 node:2010f
Sep  6 11:52:19.809280 osafsmfd [3745:imma_om_api.c:0442] << initialize_common
Sep  6 11:52:19.809287 osafsmfd [3745:imma_om_api.c:0214] << saImmOmInitialize
Sep  6 11:52:19.809293 osafsmfd [3745:imma_om_api.c:0931] >> 
saImmOmAdminOwnerInitialize
Sep  6 11:52:19.811060 osafsmfd [3745:imma_om_api.c:1143] T1 Admin owner init 
successful
Sep  6 11:52:19.811076 osafsmfd [3745:imma_om_api.c:1144] << 
saImmOmAdminOwnerInitialize
Sep  6 11:52:19.811083 osafsmfd [3745:imma_om_api.c:5528] >> 
saImmOmAccessorInitialize
Sep  6 11:52:19.811091 osafsmfd [3745:imma_om_api.c:5626] << 
saImmOmAccessorInitialize
Sep  6 12:21:09.873661 osafsmfd [2421:ncs_main_pub.c:0220] TR
NCS:PROCESS_ID=2421

Attachments:
Active Controller:
1)syslog
2)osafsmfd, osafsmfnd traces.
3)osafimmnd traces.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2001 IMM: Owner handle is getting corrupt when OmAdminOperationInvoke retruns ERR_TIMEOUT

2016-09-06 Thread Chani Srivastava
- **summary**: IMM: AdminOperation returns BAD_HANDLE when invoked second time 
--> IMM: Owner handle is getting corrupt when OmAdminOperationInvoke retruns 
ERR_TIMEOUT
- Description has changed:

Diff:



--- old
+++ new
@@ -5,7 +5,7 @@
 Summary:
 Steps to Reproduce
 1. Invoke saImmOmAdminOperationInvokeAsync_2() while waiting in callback with 
time more that OI_CALLBACK_TIMEOUT value
-2. Invoke saImmOmAdminOperationInvokeAsync_2() again and do not wait
+2. Invoke saImmOmAdminOperationInvokeAsync_2() again and do not wait OR Invoke 
any Ccb operation
 
 Observed Bahavior:
 Step1 will return SA_AIS_ERR_TIMEOUT (Expected)






---

** [tickets:#2001] IMM: Owner handle is getting corrupt when 
OmAdminOperationInvoke retruns ERR_TIMEOUT**

**Status:** unassigned
**Milestone:** 5.1.RC1
**Created:** Tue Sep 06, 2016 07:14 AM UTC by Chani Srivastava
**Last Updated:** Tue Sep 06, 2016 07:18 AM UTC
**Owner:** nobody
**Attachments:**

- 
[AdminCbkTmOut.zip](https://sourceforge.net/p/opensaf/tickets/2001/attachment/AdminCbkTmOut.zip)
 (95.1 kB; application/zip)


OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 4 nodes 1 PBE enabled

Summary:
Steps to Reproduce
1. Invoke saImmOmAdminOperationInvokeAsync_2() while waiting in callback with 
time more that OI_CALLBACK_TIMEOUT value
2. Invoke saImmOmAdminOperationInvokeAsync_2() again and do not wait OR Invoke 
any Ccb operation

Observed Bahavior:
Step1 will return SA_AIS_ERR_TIMEOUT (Expected)
Step2 is returning SA_AIS_ERR_BAD_HANDLE (SA_AIS_OK is expected)

Sep  6 12:22:27 SLOT1 python2.5: logtrace: trace enabled to file 
/tmp/imma_oi_callbacktimeout.trace, mask=0x
Sep  6 12:22:27 SLOT1 python2.5: NO IMMA library TRACE initialize done pid:1147 
svid:26 file:/tmp/imma_oi_callbacktimeout.trace
Sep  6 12:22:27 SLOT1 osafimmnd[838]: NO Implementer connected: 14 
(testOiTmout_verifyAdminOpCallback_37) <343, 2010f>
Sep  6 12:22:42 SLOT1 osafimmnd[838]: WA IMMND - Client went down so no response
Sep  6 12:22:42 SLOT1 osafimmnd[838]: WA MDS Send Failed to service:IMMND rc:2
Sep  6 12:22:42 SLOT1 osafimmnd[838]: ER Problem in sending to peer IMMND over 
MDS. Discarding admin op reply.
Sep  6 12:22:42 SLOT1 osafimmnd[838]: WA Error code 2 returned for message type 
21 - ignoring
Sep  6 12:22:47 SLOT1 osafimmnd[838]: WA IMMND - Client 1468878946575 went down 
on syncronous request, discarding request
Sep  6 12:22:47 SLOT1 osafimmnd[838]: NO Implementer locally disconnected. 
Marking it as doomed 14 <343, 2010f> (testOiTmout_verifyAdminOpCallback_37)
Sep  6 12:22:47 SLOT1 osafimmnd[838]: NO Implementer disconnected 14 <343, 
2010f> (testOiTmout_verifyAdminOpCallback_37)


Note: **Test passed in OpenSAF release 5.0**

Agent traces and immnd, immd traces attached


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1999 osafntfd on active controller crashed while logging to alarm stream

2016-09-06 Thread Srikanth R
- **summary**: LOG : ntfd  on active controller crashed while logging to alarm 
stream --> osafntfd on active controller crashed while logging to alarm stream
- **Component**: log --> ntf
- **Comment**:

After the integration of LOG with CLM (#1638), all LOG clients should 
reinitialize after CLM unlock operation.  It might be that , NTF as a LOG 
client is not reinitializing after CLM unlock and got the return value 31.  



---

** [tickets:#1999] osafntfd on active controller crashed while logging to alarm 
stream**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Tue Sep 06, 2016 05:15 AM UTC by Srikanth R
**Last Updated:** Tue Sep 06, 2016 08:09 AM UTC
**Owner:** nobody


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature disabled & 
no PBE )
AMF Application : 2N model with SUs mapped on PL-3,PL-4

Summary :
--
NTFD crashed on active controller, while logging notification to alarm stream.


Steps followed & Observed behaviour
--
 -> Initially performed couple of switchovers and tests on AMF application.
 -> Performed CLM lock operation of standby SC-1 and later unlocked.
 -> Performed switchover such that SC-1 became active controller.
 -> Stopped opensafd on PL-4. NTFD on active controller crashed.
 
Sep  6 10:18:25 CONTROLLER-1 osafamfd[2262]: NO Node 'PL-4' left the cluster
..
Sep  6 10:18:25 CONTROLLER-1 osafntfd[2242]: osaf_abort(31) called from 
0x414d1e with errno=11
Sep  6 10:18:25 CONTROLLER-1 osafamfnd[2272]: NO 
'safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'

-> Below is the excerpt from the ntfd trace.

Sep  6 10:18:25.436394 osafntfd [2242:NtfAdmin.cc:0252] T2 New notification 
received, id: 682
Sep  6 10:18:25.436398 osafntfd [2242:NtfAdmin.cc:0187] >> processNotification
Sep  6 10:18:25.436404 osafntfd [2242:NtfNotification.cc:0045] T3 constructor 
0x685790, notId: 682
Sep  6 10:18:25.436409 osafntfd [2242:ntfsv_mem.c:0761] >> ntfsv_get_ntf_header
Sep  6 10:18:25.436412 osafntfd [2242:ntfsv_mem.c:0782] << ntfsv_get_ntf_header
Sep  6 10:18:25.436425 osafntfd [2242:NtfAdmin.cc:0200] T2 notification 682 
with type 16384 added, notificationMap size is 1
Sep  6 10:18:25.436431 osafntfd [2242:NtfLogger.cc:0130] >> log
Sep  6 10:18:25.436435 osafntfd [2242:NtfLogger.cc:0132] T2 notification Id=682 
received in logger with size 0
Sep  6 10:18:25.436439 osafntfd [2242:NtfLogger.cc:0135] T2 IS LOCAL, logging
Sep  6 10:18:25.436442 osafntfd [2242:NtfLogger.cc:0166] >> checkQueueAndLog
Sep  6 10:18:25.436447 osafntfd [2242:NtfLogger.cc:0196] >> logNotification
Sep  6 10:18:25.436452 osafntfd [2242:ntfsv_mem.c:0761] >> ntfsv_get_ntf_header
Sep  6 10:18:25.436455 osafntfd [2242:ntfsv_mem.c:0782] << ntfsv_get_ntf_header
Sep  6 10:18:25.436460 osafntfd [2242:NtfLogger.cc:0231] T2 Logging 
notification to alarm stream
Sep  6 10:18:25.436495 osafntfd [2242:lga_api.c:1151] >> saLogWriteLogAsync
Sep  6 10:18:25.436500 osafntfd [2242:lga_api.c:1015] >> handle_log_record
Sep  6 10:18:25.436507 osafntfd [2242:lga_api.c:1110] << handle_log_record
Sep  6 10:18:25.436518 osafntfd [2242:lga_api.c:1229] TR **saLogWriteLogAsync 
Node not CLM member or stale client**
Sep  6 10:18:25.436524 osafntfd [2242:lga_api.c:1320] << saLogWriteLogAsync
Sep  6 10:18:42.472616 osafntfd [2176:ntfs_main.c:0181] >> initialize




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2003 amf: SG unstable when SU moves to TERM_FAILED state during fresh assignments.

2016-09-06 Thread Praveen
- **status**: accepted --> review



---

** [tickets:#2003] amf: SG unstable when SU moves to TERM_FAILED state during 
fresh assignments.**

**Status:** review
**Milestone:** 4.7.2
**Created:** Tue Sep 06, 2016 08:31 AM UTC by Praveen
**Last Updated:** Tue Sep 06, 2016 08:32 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[term_failed.tgz](https://sourceforge.net/p/opensaf/tickets/2003/attachment/term_failed.tgz)
 (30.1 kB; application/x-compressed)


Conf: 2N model, one NPI comp in NPI SU.
Steps to reproduce:
1)Add application using immcfg command.
2)Lock SG.
3)Unlock-in and unlock SUs.
4)Make provisions so that instantiation and clean up scripts returns with 
non-zero status.
5)Unlock SG.

When SG is unlocked, AMFND initiates active assignments by instantiating the 
only component. After instantiation failure, AMFND tries to clean up the 
component. Cleanup fails. AMFND marks comp and SU in TERM_FAILED state, but it 
neither responds to AMFD for the completion of assignment nor it sends any 
recovery request. Because of this SG remains unstable in REALIGN state.In this 
state, no admin operation is allowed.
Attached are traces.

Even though issue seems to be similar to #538, it is different in one aspect. 
In #538, SU moves to TERM_FAILED state and there is possibiltiy of 
failover/switchover as standby assignments are present.
In the present case, it happened during initial assignments and thus there is 
no standby to switchover/failover to. 



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2004 SMF: smfd got crashed when triggered campaign for application upgrade.

2016-09-06 Thread Madhurika Koppula



---

** [tickets:#2004] SMF: smfd got crashed when triggered campaign for 
application upgrade.**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Tue Sep 06, 2016 08:35 AM UTC by Madhurika Koppula
**Last Updated:** Tue Sep 06, 2016 08:35 AM UTC
**Owner:** nobody
**Attachments:**

- [smf.tgz](https://sourceforge.net/p/opensaf/tickets/2004/attachment/smf.tgz) 
(1.6 MB; application/octet-stream)


**Environment Details:**

OS : Suse 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
1PBE enabled ).

**summary:**

smfd got crashed due to segfault on active controller.

**Steps followed & Observed behaviour:**

Test SGupgrade of 2N model with valid configurations.

**Observations:**

Active controller went for reboot due to avadown for smfd.

Below is the snippet of syslog on active controller:

Sep  6 11:52:19 SLES-M-SLOT-1 osafsmfd[3745]: NO 
SmfProcedureThread::getImmProcedure, IMM data for procedure 
safSmfProc=amfClusterProc-1,safSmfCampaign=Campaign2,safApp=safSmfService not 
found
Sep  6 11:52:19 SLES-M-SLOT-1 osafimmnd[3661]: NO Implementer connected: 20 
(safSmfProc1) <662, 2010f>
Sep  6 11:52:19 SLES-M-SLOT-1 osafsmfd[3745]: NO PROC: Start upgrade procedure 
safSmfProc=amfClusterProc-1
Sep  6 11:52:19 SLES-M-SLOT-1 osafsmfd[3745]: NO PROC: Start procedure init 
actions

Sep  6 11:52:19 SLES-M-SLOT-1 osafamfnd[3726]: NO 
'safComp=SMF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'

**Sep  6 11:52:19 SLES-M-SLOT-1 osafamfnd[3726]: ER 
safComp=SMF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast**

Sep  6 11:52:19 SLES-M-SLOT-1 osafamfnd[3726]: Rebooting OpenSAF NodeId = 
131343 EE Name = , Reason: Component faulted: recovery is node failfast, 
OwnNodeId = 131343, SupervisionTime = 60

Below is the snippet of osafsmfd trace on active controller:

Sep  6 11:52:19.808986 osafsmfd [3745:SmfUpgradeProcedure.cc:0741] TR 
SmfUpgradeProcedure::calculateRollingSteps:calculateRollingSteps new SW install 
step added safSmfStep=0003 (with no act/deact unit) for node 
safAmfNode=PL-4,safAmfCluster=myAmfCluster
Sep  6 11:52:19.808995 osafsmfd [3745:SmfUpgradeProcedure.cc:1876] >> 
addStepModifications
Sep  6 11:52:19.809002 osafsmfd [3745:SmfUpgradeProcedure.cc:1931] >> 
addStepModificationsNode
Sep  6 11:52:19.809008 osafsmfd [3745:imma_om_api.c:0160] >> saImmOmInitialize
Sep  6 11:52:19.809015 osafsmfd [3745:imma_om_api.c:0186] TR OM client version 
A.2.1
Sep  6 11:52:19.809021 osafsmfd [3745:imma_om_api.c:0228] >> initialize_common
Sep  6 11:52:19.809026 osafsmfd [3745:imma_init.c:0275] >> imma_startup: use 
count 1
Sep  6 11:52:19.809032 osafsmfd [3745:imma_init.c:0298] << imma_startup: use 
count 2
Sep  6 11:52:19.809040 osafsmfd [3745:imma_om_api.c:0246] T2 IMMA library 
syncronous timeout set to:3
Sep  6 11:52:19.809263 osafsmfd [3745:imma_om_api.c:0349] T1 Trying to add OM 
client id:727 node:2010f
Sep  6 11:52:19.809280 osafsmfd [3745:imma_om_api.c:0442] << initialize_common
Sep  6 11:52:19.809287 osafsmfd [3745:imma_om_api.c:0214] << saImmOmInitialize
Sep  6 11:52:19.809293 osafsmfd [3745:imma_om_api.c:0931] >> 
saImmOmAdminOwnerInitialize
Sep  6 11:52:19.811060 osafsmfd [3745:imma_om_api.c:1143] T1 Admin owner init 
successful
Sep  6 11:52:19.811076 osafsmfd [3745:imma_om_api.c:1144] << 
saImmOmAdminOwnerInitialize
Sep  6 11:52:19.811083 osafsmfd [3745:imma_om_api.c:5528] >> 
saImmOmAccessorInitialize
Sep  6 11:52:19.811091 osafsmfd [3745:imma_om_api.c:5626] << 
saImmOmAccessorInitialize
Sep  6 12:21:09.873661 osafsmfd [2421:ncs_main_pub.c:0220] TR
NCS:PROCESS_ID=2421

Attachments:
Active Controller:
1)syslog
2)osafsmfd, osafsmfnd traces.
3)osafimmnd traces.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2003 amf: SG unstable when SU moves to TERM_FAILED state during fresh assignments.

2016-09-06 Thread Praveen
- **status**: unassigned --> accepted
- **assigned_to**: Praveen
- **Component**: unknown --> amf
- **Part**: - --> nd



---

** [tickets:#2003] amf: SG unstable when SU moves to TERM_FAILED state during 
fresh assignments.**

**Status:** accepted
**Milestone:** 4.7.2
**Created:** Tue Sep 06, 2016 08:31 AM UTC by Praveen
**Last Updated:** Tue Sep 06, 2016 08:31 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[term_failed.tgz](https://sourceforge.net/p/opensaf/tickets/2003/attachment/term_failed.tgz)
 (30.1 kB; application/x-compressed)


Conf: 2N model, one NPI comp in NPI SU.
Steps to reproduce:
1)Add application using immcfg command.
2)Lock SG.
3)Unlock-in and unlock SUs.
4)Make provisions so that instantiation and clean up scripts returns with 
non-zero status.
5)Unlock SG.

When SG is unlocked, AMFND initiates active assignments by instantiating the 
only component. After instantiation failure, AMFND tries to clean up the 
component. Cleanup fails. AMFND marks comp and SU in TERM_FAILED state, but it 
neither responds to AMFD for the completion of assignment nor it sends any 
recovery request. Because of this SG remains unstable in REALIGN state.In this 
state, no admin operation is allowed.
Attached are traces.

Even though issue seems to be similar to #538, it is different in one aspect. 
In #538, SU moves to TERM_FAILED state and there is possibiltiy of 
failover/switchover as standby assignments are present.
In the present case, it happened during initial assignments and thus there is 
no standby to switchover/failover to. 



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2003 amf: SG unstable when SU moves to TERM_FAILED state during fresh assignments.

2016-09-06 Thread Praveen



---

** [tickets:#2003] amf: SG unstable when SU moves to TERM_FAILED state during 
fresh assignments.**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Tue Sep 06, 2016 08:31 AM UTC by Praveen
**Last Updated:** Tue Sep 06, 2016 08:31 AM UTC
**Owner:** nobody
**Attachments:**

- 
[term_failed.tgz](https://sourceforge.net/p/opensaf/tickets/2003/attachment/term_failed.tgz)
 (30.1 kB; application/x-compressed)


Conf: 2N model, one NPI comp in NPI SU.
Steps to reproduce:
1)Add application using immcfg command.
2)Lock SG.
3)Unlock-in and unlock SUs.
4)Make provisions so that instantiation and clean up scripts returns with 
non-zero status.
5)Unlock SG.

When SG is unlocked, AMFND initiates active assignments by instantiating the 
only component. After instantiation failure, AMFND tries to clean up the 
component. Cleanup fails. AMFND marks comp and SU in TERM_FAILED state, but it 
neither responds to AMFD for the completion of assignment nor it sends any 
recovery request. Because of this SG remains unstable in REALIGN state.In this 
state, no admin operation is allowed.
Attached are traces.

Even though issue seems to be similar to #538, it is different in one aspect. 
In #538, SU moves to TERM_FAILED state and there is possibiltiy of 
failover/switchover as standby assignments are present.
In the present case, it happened during initial assignments and thus there is 
no standby to switchover/failover to. 



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1988 AMF: Admin operation continuation does not work with short cluster init timeout

2016-09-06 Thread Minh Hon Chau
The attr saAmfClusterStartupTimeout currently is set as 10 sec by default. It's 
only started if all NCS SUs of active controller get assigned. In big clusters, 
if this timeout is still set as 10secs, when it times out there are still many 
nodes hasn't joined cluster, many SU out-of-service. AMFD could not start 
assignment when cluster init timeout.
Aug 19 12:32:05.923649 osafamfd [6705:timer.cc:0066] >> avd_start_tmr: 1
Aug 19 12:32:15.987858 osafamfd [6705:cluster.cc:0055] >> 
avd_cluster_tmr_init_evh 

Aug 19 12:32:15.988226 osafamfd [6705:sg_2n_fsm.cc:2808] >> realign: 
'safSg=2N,safApp=ABC-01'
Aug 19 12:32:15.988254 osafamfd [6705:sg_2n_fsm.cc:0606] TR No in service SUs 
available in the SG

Aug 19 12:32:15.988640 osafamfd [6705:sg_2n_fsm.cc:2808] >> realign: 
'safSg=2N,safApp=ABC-02'
Aug 19 12:32:15.988661 osafamfd [6705:sg_2n_fsm.cc:0606] TR No in service SUs 
available in the SG

However, this does not cause any problem in cluster start-up scenario because 
AMFD will also start assignment up on receiving avd_su_oper_state_evh() by 
calling su_insvc(). This happen after a node completes joining cluster. The one 
joins cluster earlier, the better chance that its SU been assigned active.

Also, if all NCS SUs of active controller have not been assigned, the cb state 
is not INIT_DONE, AMFD will reject node_up msg of all other nodes.

In admin operation continuation after headless, AMFD can't do a similiar 
sequence as above, because the way SU has fresh assignment (su_insvc) is 
different from SU continues its pending assignment (susi_success). AMFD needs 
to have all nodes joined cluster before performing a continuation of admin 
operation.


---

** [tickets:#1988] AMF: Admin operation continuation does not work with short 
cluster init timeout**

**Status:** assigned
**Milestone:** 5.1.RC1
**Created:** Wed Aug 31, 2016 12:04 AM UTC by Minh Hon Chau
**Last Updated:** Wed Aug 31, 2016 12:04 AM UTC
**Owner:** Minh Hon Chau


In scenario of admin continuation after headless, if saAmfClusterStartupTimeout 
configures short value, then the admin continuation will initiate when 
saAmfClusterStartupTimeout expires but the SU is still in OUT OF SERVICE. The 
eventual result is failure of admin operation after headless.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2002 CLM : Agent crashed for invalid check in buffer notification parameter

2016-09-06 Thread Srikanth R



---

** [tickets:#2002] CLM : Agent crashed for invalid check in buffer notification 
parameter**

**Status:** unassigned
**Milestone:** 5.1.RC1
**Created:** Tue Sep 06, 2016 08:15 AM UTC by Srikanth R
**Last Updated:** Tue Sep 06, 2016 08:15 AM UTC
**Owner:** nobody


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature disabled & 
no PBE )
AMF Application : 2N model with SUs mapped on PL-3,PL-4



Steps followed & Observed behaviour
--

-> Call saClmClusterTrack_4 api with CURRENT flag and buffer parameter 
populated.  Here the buffer paramter is populated by allocating suffiicent 
memory of numberOfItems but notification is having garbage values.

Agent crashed with the following back trace, if notification is having garbage 
values.

 -> #3  0x7f4ccb370c9f in osaf_extended_name_length (name=0x9d5e4e) at 
osaf_extended_name.c:139
-> #4  0x7f4cca9ff27c in clma_validate_flags_buf_4 (hdl_rec=0x97cbc0, 
flags=1 '\001', buf=0x97c190) at clma_api.c:183
->#5  0x7f4ccaa00fe5 in clmaclustertrack (clmHandle=4290772993, flags=1 
'\001', buf=0x0, buf_4=0x97c190) at clma_api.c:1032
->#6  0x7f4ccaa00d40 in saClmClusterTrack_4 (clmHandle=4290772993, flags=1 
'\001', buf=0x97c190) at clma_api.c:958


Expected behaviour
--
If the buffer parameter is NULL, CLM shall invoke a callback. If the buffer 
parameter is not NULL, CLM should check only value of numberOfItems  and 
evaluate whether sufficient memory is allocated by user or not.  

With the #1906 changes, contents of notification are also verified.  But only 
structure member numberOfItems  is to be verified.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1998 amf: protection group track non existing csi returns SA_AIS_ERR_INIT

2016-09-06 Thread Long HB Nguyen
- **status**: review --> fixed
- **Comment**:

default: [staging:36f63c]
changeset:   8005:36f63cf5aa4d
parent:  8003:4dfd86ce806e
user:Long Nguyen 
date:Tue Sep 06 17:10:19 2016 +1000
summary: amfa: fix pg track returns SA_AIS_ERR_INIT [#1998]

opensaf-5.1.x: [staging:f8bc9f]
changeset:   8006:f8bc9f897235
branch:  opensaf-5.1.x
tag: tip
parent:  8004:a7ed45608a5b
user:Long Nguyen 
date:Tue Sep 06 17:12:58 2016 +1000
summary: amfa: fix pg track returns SA_AIS_ERR_INIT [#1998]




---

** [tickets:#1998] amf: protection group track non existing csi returns 
SA_AIS_ERR_INIT**

**Status:** fixed
**Milestone:** 5.1.RC1
**Created:** Mon Sep 05, 2016 07:22 AM UTC by Long HB Nguyen
**Last Updated:** Tue Sep 06, 2016 03:02 AM UTC
**Owner:** Long HB Nguyen


Steps to reproduce
--
- Use 2N model.
- Modify amf_demo.c as follow:
+ Initialze amf_demo with saAmfInitialize_4 or saAmfInitialize_o4.
+ Add a callback for protection group.
+ Call saAmfProtectionGroupTrack with a non-existing csi (e.g. "dummy" 
csi), the flag is SA_TRACK_CURRENT and notificationBuffer is NULL.

Observed behaviour
--
Before the patches for #1553 were pushed, the testcase had returned 
SA_AIS_ERR_NOT_EXIST return code.
After the patches for #1553 were pushed, the testcase has returned 
SA_AIS_ERR_INIT return code.


Initial investigation:
--
In the patches for #1553, Praveen added an internal callback structure 
(OsafAmfCallbacksT):
The structure divides protection track callback in two cases:
- SaAmfProtectionGroupTrackCallbackT for versions older than B.04.01.
- SaAmfProtectionGroupTrackCallbackT_4 for versions from B.04.01.

In the case that amf_demo is initialized with callbacks for B.04.01 (i.e. 
saAmfProtectionGroupTrackCallback_4 is set). When amf_demo call 
saAmfProtectionGroupTrack, amfa checks saAmfProtectionGroupTrackCallback (it is 
NULL now).
Then, amfa returns SA_AIS_ERR_INIT.

​


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1999 LOG : ntfd on active controller crashed while logging to alarm stream

2016-09-06 Thread Vu Minh Nguyen
This may be caused by the bug reported in this ticket [#1985]
osaf/services/saf/logsv/lgs/lgs_clm.cc:120]: (error) Uninitialized variable: rc

This ticket is on review status.


---

** [tickets:#1999] LOG : ntfd  on active controller crashed while logging to 
alarm stream**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Tue Sep 06, 2016 05:15 AM UTC by Srikanth R
**Last Updated:** Tue Sep 06, 2016 05:15 AM UTC
**Owner:** nobody


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature disabled & 
no PBE )
AMF Application : 2N model with SUs mapped on PL-3,PL-4

Summary :
--
NTFD crashed on active controller, while logging notification to alarm stream.


Steps followed & Observed behaviour
--
 -> Initially performed couple of switchovers and tests on AMF application.
 -> Performed CLM lock operation of standby SC-1 and later unlocked.
 -> Performed switchover such that SC-1 became active controller.
 -> Stopped opensafd on PL-4. NTFD on active controller crashed.
 
Sep  6 10:18:25 CONTROLLER-1 osafamfd[2262]: NO Node 'PL-4' left the cluster
..
Sep  6 10:18:25 CONTROLLER-1 osafntfd[2242]: osaf_abort(31) called from 
0x414d1e with errno=11
Sep  6 10:18:25 CONTROLLER-1 osafamfnd[2272]: NO 
'safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'

-> Below is the excerpt from the ntfd trace.

Sep  6 10:18:25.436394 osafntfd [2242:NtfAdmin.cc:0252] T2 New notification 
received, id: 682
Sep  6 10:18:25.436398 osafntfd [2242:NtfAdmin.cc:0187] >> processNotification
Sep  6 10:18:25.436404 osafntfd [2242:NtfNotification.cc:0045] T3 constructor 
0x685790, notId: 682
Sep  6 10:18:25.436409 osafntfd [2242:ntfsv_mem.c:0761] >> ntfsv_get_ntf_header
Sep  6 10:18:25.436412 osafntfd [2242:ntfsv_mem.c:0782] << ntfsv_get_ntf_header
Sep  6 10:18:25.436425 osafntfd [2242:NtfAdmin.cc:0200] T2 notification 682 
with type 16384 added, notificationMap size is 1
Sep  6 10:18:25.436431 osafntfd [2242:NtfLogger.cc:0130] >> log
Sep  6 10:18:25.436435 osafntfd [2242:NtfLogger.cc:0132] T2 notification Id=682 
received in logger with size 0
Sep  6 10:18:25.436439 osafntfd [2242:NtfLogger.cc:0135] T2 IS LOCAL, logging
Sep  6 10:18:25.436442 osafntfd [2242:NtfLogger.cc:0166] >> checkQueueAndLog
Sep  6 10:18:25.436447 osafntfd [2242:NtfLogger.cc:0196] >> logNotification
Sep  6 10:18:25.436452 osafntfd [2242:ntfsv_mem.c:0761] >> ntfsv_get_ntf_header
Sep  6 10:18:25.436455 osafntfd [2242:ntfsv_mem.c:0782] << ntfsv_get_ntf_header
Sep  6 10:18:25.436460 osafntfd [2242:NtfLogger.cc:0231] T2 Logging 
notification to alarm stream
Sep  6 10:18:25.436495 osafntfd [2242:lga_api.c:1151] >> saLogWriteLogAsync
Sep  6 10:18:25.436500 osafntfd [2242:lga_api.c:1015] >> handle_log_record
Sep  6 10:18:25.436507 osafntfd [2242:lga_api.c:1110] << handle_log_record
Sep  6 10:18:25.436518 osafntfd [2242:lga_api.c:1229] TR **saLogWriteLogAsync 
Node not CLM member or stale client**
Sep  6 10:18:25.436524 osafntfd [2242:lga_api.c:1320] << saLogWriteLogAsync
Sep  6 10:18:42.472616 osafntfd [2176:ntfs_main.c:0181] >> initialize




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2001 IMM: AdminOperation returns BAD_HANDLE when invoked second time

2016-09-06 Thread Chani Srivastava
- Description has changed:

Diff:



--- old
+++ new
@@ -11,6 +11,18 @@
 Step1 will return SA_AIS_ERR_TIMEOUT (Expected)
 Step2 is returning SA_AIS_ERR_BAD_HANDLE (SA_AIS_OK is expected)
 
-Note: Test passed in OpenSAF release 5.0
+Sep  6 12:22:27 SLOT1 python2.5: logtrace: trace enabled to file 
/tmp/imma_oi_callbacktimeout.trace, mask=0x
+Sep  6 12:22:27 SLOT1 python2.5: NO IMMA library TRACE initialize done 
pid:1147 svid:26 file:/tmp/imma_oi_callbacktimeout.trace
+Sep  6 12:22:27 SLOT1 osafimmnd[838]: NO Implementer connected: 14 
(testOiTmout_verifyAdminOpCallback_37) <343, 2010f>
+Sep  6 12:22:42 SLOT1 osafimmnd[838]: WA IMMND - Client went down so no 
response
+Sep  6 12:22:42 SLOT1 osafimmnd[838]: WA MDS Send Failed to service:IMMND rc:2
+Sep  6 12:22:42 SLOT1 osafimmnd[838]: ER Problem in sending to peer IMMND over 
MDS. Discarding admin op reply.
+Sep  6 12:22:42 SLOT1 osafimmnd[838]: WA Error code 2 returned for message 
type 21 - ignoring
+Sep  6 12:22:47 SLOT1 osafimmnd[838]: WA IMMND - Client 1468878946575 went 
down on syncronous request, discarding request
+Sep  6 12:22:47 SLOT1 osafimmnd[838]: NO Implementer locally disconnected. 
Marking it as doomed 14 <343, 2010f> (testOiTmout_verifyAdminOpCallback_37)
+Sep  6 12:22:47 SLOT1 osafimmnd[838]: NO Implementer disconnected 14 <343, 
2010f> (testOiTmout_verifyAdminOpCallback_37)
+
+
+Note: **Test passed in OpenSAF release 5.0**
 
 Agent traces and immnd, immd traces attached






---

** [tickets:#2001] IMM: AdminOperation returns BAD_HANDLE when invoked second 
time**

**Status:** unassigned
**Milestone:** 5.1.RC1
**Created:** Tue Sep 06, 2016 07:14 AM UTC by Chani Srivastava
**Last Updated:** Tue Sep 06, 2016 07:14 AM UTC
**Owner:** nobody
**Attachments:**

- 
[AdminCbkTmOut.zip](https://sourceforge.net/p/opensaf/tickets/2001/attachment/AdminCbkTmOut.zip)
 (95.1 kB; application/zip)


OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 4 nodes 1 PBE enabled

Summary:
Steps to Reproduce
1. Invoke saImmOmAdminOperationInvokeAsync_2() while waiting in callback with 
time more that OI_CALLBACK_TIMEOUT value
2. Invoke saImmOmAdminOperationInvokeAsync_2() again and do not wait

Observed Bahavior:
Step1 will return SA_AIS_ERR_TIMEOUT (Expected)
Step2 is returning SA_AIS_ERR_BAD_HANDLE (SA_AIS_OK is expected)

Sep  6 12:22:27 SLOT1 python2.5: logtrace: trace enabled to file 
/tmp/imma_oi_callbacktimeout.trace, mask=0x
Sep  6 12:22:27 SLOT1 python2.5: NO IMMA library TRACE initialize done pid:1147 
svid:26 file:/tmp/imma_oi_callbacktimeout.trace
Sep  6 12:22:27 SLOT1 osafimmnd[838]: NO Implementer connected: 14 
(testOiTmout_verifyAdminOpCallback_37) <343, 2010f>
Sep  6 12:22:42 SLOT1 osafimmnd[838]: WA IMMND - Client went down so no response
Sep  6 12:22:42 SLOT1 osafimmnd[838]: WA MDS Send Failed to service:IMMND rc:2
Sep  6 12:22:42 SLOT1 osafimmnd[838]: ER Problem in sending to peer IMMND over 
MDS. Discarding admin op reply.
Sep  6 12:22:42 SLOT1 osafimmnd[838]: WA Error code 2 returned for message type 
21 - ignoring
Sep  6 12:22:47 SLOT1 osafimmnd[838]: WA IMMND - Client 1468878946575 went down 
on syncronous request, discarding request
Sep  6 12:22:47 SLOT1 osafimmnd[838]: NO Implementer locally disconnected. 
Marking it as doomed 14 <343, 2010f> (testOiTmout_verifyAdminOpCallback_37)
Sep  6 12:22:47 SLOT1 osafimmnd[838]: NO Implementer disconnected 14 <343, 
2010f> (testOiTmout_verifyAdminOpCallback_37)


Note: **Test passed in OpenSAF release 5.0**

Agent traces and immnd, immd traces attached


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2001 IMM: AdminOperation returns BAD_HANDLE when invoked second time

2016-09-06 Thread Chani Srivastava



---

** [tickets:#2001] IMM: AdminOperation returns BAD_HANDLE when invoked second 
time**

**Status:** unassigned
**Milestone:** 5.1.RC1
**Created:** Tue Sep 06, 2016 07:14 AM UTC by Chani Srivastava
**Last Updated:** Tue Sep 06, 2016 07:14 AM UTC
**Owner:** nobody
**Attachments:**

- 
[AdminCbkTmOut.zip](https://sourceforge.net/p/opensaf/tickets/2001/attachment/AdminCbkTmOut.zip)
 (95.1 kB; application/zip)


OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 4 nodes 1 PBE enabled

Summary:
Steps to Reproduce
1. Invoke saImmOmAdminOperationInvokeAsync_2() while waiting in callback with 
time more that OI_CALLBACK_TIMEOUT value
2. Invoke saImmOmAdminOperationInvokeAsync_2() again and do not wait

Observed Bahavior:
Step1 will return SA_AIS_ERR_TIMEOUT (Expected)
Step2 is returning SA_AIS_ERR_BAD_HANDLE (SA_AIS_OK is expected)

Note: Test passed in OpenSAF release 5.0

Agent traces and immnd, immd traces attached


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2000 msg: Cluster reset happend due to msgd crashed on both the controller

2016-09-06 Thread Ritu Raj



---

** [tickets:#2000] msg: Cluster reset happend due to msgd crashed on both the 
controller**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Tue Sep 06, 2016 06:04 AM UTC by Ritu Raj
**Last Updated:** Tue Sep 06, 2016 06:04 AM UTC
**Owner:** nobody
**Attachments:**

- 
[Active_syslog](https://sourceforge.net/p/opensaf/tickets/2000/attachment/Active_syslog)
 (716.7 kB; application/octet-stream)
- 
[Standby_syslog](https://sourceforge.net/p/opensaf/tickets/2000/attachment/Standby_syslog)
 (696.4 kB; application/octet-stream)


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
1PBE enabled with 30K objects )

Summary :
--
Cluster reset happend due to assertion SA_MAX_UNEXTENDED_NAME_LENGTH failed in 
msgd

Steps followed & Observed behaviour
--
1.  Invoked failover 
2.  After, few successful failover, New Active Controller rebooted beacuse of 
Assertion 'length < SA_MAX_UNEXTENDED_NAME_LENGTH' failed in msgd. While 
previous Active joinig the cluster as a Standby Role resulted cluster reset 
happend. 
[Timeline: Sep  6 00:13:02 sofo-s2]

Sep  6 00:13:02 sofo-s2 osafimmd[3985]: NO MDS event from svc_id 24 (change:5, 
dest:13)
Sep  6 00:13:02 sofo-s2 osafmsgd[4145]: osaf_extended_name.c:139: 
osaf_extended_name_length: Assertion 'length < SA_MAX_UNEXTENDED_NAME_LENGTH' 
failed.
Sep  6 00:13:02 sofo-s2 osafamfnd[4046]: NO 
'safComp=MQD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Sep  6 00:13:02 sofo-s2 osafamfnd[4046]: ER 
safComp=MQD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Sep  6 00:13:02 sofo-s2 osafamfnd[4046]: Rebooting OpenSAF NodeId = 131599 EE 
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131599, SupervisionTime = 60
Sep  6 00:13:02 sofo-s2 opensaf_reboot: Rebooting local node; timeout=60

Notes:
1. Syslog attached
2  msgnd & msgd  trace not enabled


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets