[tickets] [opensaf:tickets] #2000 msg: Cluster reset happend due to msgd crashed on both the controller
I attahced the bt and msgd trace file, below is the snippet of bt: 2 0x7f44089ef197 in __osafassert_fail (__file=0x7f4408a41987 "osaf_extended_name.c", __line=139, __func=0x7f4408a419f0 <__FUNCTION__.2883> "osaf_extended_name_length", __assertion=0x7f4408a41960 "length < SA_MAX_UNEXTENDED_NAME_LENGTH") at sysf_def.c:281 #3 0x7f44089ead1e in osaf_extended_name_length (name=0x67a72c) at osaf_extended_name.c:139 #4 0x7f44089fe7ff in osaf_encode_sanamet (ub=0x7fff9f4f09d0, name=0x67a72c) at hj_enc.c:403 #5 0x7f44089eb275 in ncs_edp_sanamet (hdl=0x6654c0, edu_tkn=0x0, ptr=0x67a72c, ptr_data_len=0x7fff9f4eee14, buf_env=0x7fff9f4f0130, op=EDP_OP_TYPE_ENC, o_err=0x7fff9f4f0238) at saf_edu.c:62 #6 0x7f44089f8ca1 in ncs_edu_run_edp (edu_hdl=0x6654c0, edu_tkn=0x0, rule=0x7fff9f4ef190, edp=0x404f40, ptr=0x67a72c, dcnt=0x7fff9f4eee14, buf_env=0x7fff9f4f0130, optype=EDP_OP_TYPE_ENC, o_err=0x7fff9f4f0238) at hj_edu.c:499 #7 0x7f44089f99b2 in ncs_edu_prfm_enc_on_non_ptr (edu_hdl=0x6654c0, edu_tkn=0x0, hdl_node=0x0, rule=0x7fff9f4ef190, ptr=0x67a72c, ptr_data_len=0x7fff9f4ef364, buf_env=0x7fff9f4f0130, o_err=0x7fff9f4f0238) at hj_edu.c:972 #8 0x7f44089f9302 in ncs_edu_perform_exec_action_on_non_ptr (edu_hdl=0x6654c0, edu_tkn=0x0, hdl_node=0x0, rule=0x7fff9f4ef190, optype=EDP_OP_TYPE_ENC, ptr=0x67a72c, ptr_data_len=0x7fff9f4ef364, buf_env=0x7fff9f4f0130, o_err=0x7fff9f4f0238) at hj_edu.c:805 #9 0x7f44089f92a0 in ncs_edu_perform_exec_action (edu_hdl=0x6654c0, edu_tkn=0x0, hdl_node=0x0, rule=0x7fff9f4ef190, optype=EDP_OP_TYPE_ENC, ptr=0x67a72c, ptr_data_len=0x7fff9f4ef364, buf_env=0x7fff9f4f0130, o_err=0x7fff9f4f0238) at hj_edu.c:780 #10 0x7f44089f9041 in ncs_edu_exec_rule (edu_hdl=0x6654c0, edu_tkn=0x0, hdl_node=0x0, rule=0x7fff9f4ef190, ptr=0x67a72c, ptr_data_len=0x7fff9f4ef364, buf_env=0x7fff9f4f0130, optype=EDP_OP_TYPE_ENC, o_err=0x7fff9f4f0238) at hj_edu.c:627 #11 0x7f44089fa8db in ncs_edu_run_rules_for_enc (edu_hdl=0x6654c0, edu_tkn=0x0, hdl_node=0x0, prog=0x7fff9f4ef150, ptr=0x67a72c, ptr_data_len=0x7fff9f4ef364, buf_env=0x7fff9f4f0130, o_err=0x7fff9f4f0238, instr_count=4) at hj_edu.c:1666 Attachments: - [bt_msgd.tar](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/1ff9fd44/ec64/attachment/bt_msgd.tar) (20.5 kB; application/x-tar) - [osafmsgd](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/1ff9fd44/ec64/attachment/osafmsgd) (280.6 kB; application/octet-stream) --- ** [tickets:#2000] msg: Cluster reset happend due to msgd crashed on both the controller** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Tue Sep 06, 2016 06:04 AM UTC by Ritu Raj **Last Updated:** Tue Sep 06, 2016 06:04 AM UTC **Owner:** nobody **Attachments:** - [Active_syslog](https://sourceforge.net/p/opensaf/tickets/2000/attachment/Active_syslog) (716.7 kB; application/octet-stream) - [Standby_syslog](https://sourceforge.net/p/opensaf/tickets/2000/attachment/Standby_syslog) (696.4 kB; application/octet-stream) Environment details -- OS : Suse 64bit Changeset : 7997 ( 5.1.FC) Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 1PBE enabled with 30K objects ) Summary : -- Cluster reset happend due to assertion SA_MAX_UNEXTENDED_NAME_LENGTH failed in msgd Steps followed & Observed behaviour -- 1. Invoked failover 2. After, few successful failover, New Active Controller rebooted beacuse of Assertion 'length < SA_MAX_UNEXTENDED_NAME_LENGTH' failed in msgd. While previous Active joinig the cluster as a Standby Role resulted cluster reset happend. [Timeline: Sep 6 00:13:02 sofo-s2] Sep 6 00:13:02 sofo-s2 osafimmd[3985]: NO MDS event from svc_id 24 (change:5, dest:13) Sep 6 00:13:02 sofo-s2 osafmsgd[4145]: osaf_extended_name.c:139: osaf_extended_name_length: Assertion 'length < SA_MAX_UNEXTENDED_NAME_LENGTH' failed. Sep 6 00:13:02 sofo-s2 osafamfnd[4046]: NO 'safComp=MQD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' Sep 6 00:13:02 sofo-s2 osafamfnd[4046]: ER safComp=MQD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast Sep 6 00:13:02 sofo-s2 osafamfnd[4046]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131599, SupervisionTime = 60 Sep 6 00:13:02 sofo-s2 opensaf_reboot: Rebooting local node; timeout=60 Notes: 1. Syslog attached 2 msgnd & msgd trace not enabled --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing
[tickets] [opensaf:tickets] #1999 osafntfd on active controller crashed while logging to alarm stream
- **status**: unassigned --> accepted - **assigned_to**: A V Mahesh (AVM) - **Component**: ntf --> log - **Milestone**: 4.7.2 --> 5.1.RC1 - **Comment**: Even linking with New agents A.2.2 code , if client saLogInitialize with A.2.1 , CLM status should be ignored . --- ** [tickets:#1999] osafntfd on active controller crashed while logging to alarm stream** **Status:** accepted **Milestone:** 5.1.RC1 **Created:** Tue Sep 06, 2016 05:15 AM UTC by Srikanth R **Last Updated:** Tue Sep 06, 2016 09:34 AM UTC **Owner:** A V Mahesh (AVM) Environment details -- OS : Suse 64bit Changeset : 7997 ( 5.1.FC) Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature disabled & no PBE ) AMF Application : 2N model with SUs mapped on PL-3,PL-4 Summary : -- NTFD crashed on active controller, while logging notification to alarm stream. Steps followed & Observed behaviour -- -> Initially performed couple of switchovers and tests on AMF application. -> Performed CLM lock operation of standby SC-1 and later unlocked. -> Performed switchover such that SC-1 became active controller. -> Stopped opensafd on PL-4. NTFD on active controller crashed. Sep 6 10:18:25 CONTROLLER-1 osafamfd[2262]: NO Node 'PL-4' left the cluster .. Sep 6 10:18:25 CONTROLLER-1 osafntfd[2242]: osaf_abort(31) called from 0x414d1e with errno=11 Sep 6 10:18:25 CONTROLLER-1 osafamfnd[2272]: NO 'safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' -> Below is the excerpt from the ntfd trace. Sep 6 10:18:25.436394 osafntfd [2242:NtfAdmin.cc:0252] T2 New notification received, id: 682 Sep 6 10:18:25.436398 osafntfd [2242:NtfAdmin.cc:0187] >> processNotification Sep 6 10:18:25.436404 osafntfd [2242:NtfNotification.cc:0045] T3 constructor 0x685790, notId: 682 Sep 6 10:18:25.436409 osafntfd [2242:ntfsv_mem.c:0761] >> ntfsv_get_ntf_header Sep 6 10:18:25.436412 osafntfd [2242:ntfsv_mem.c:0782] << ntfsv_get_ntf_header Sep 6 10:18:25.436425 osafntfd [2242:NtfAdmin.cc:0200] T2 notification 682 with type 16384 added, notificationMap size is 1 Sep 6 10:18:25.436431 osafntfd [2242:NtfLogger.cc:0130] >> log Sep 6 10:18:25.436435 osafntfd [2242:NtfLogger.cc:0132] T2 notification Id=682 received in logger with size 0 Sep 6 10:18:25.436439 osafntfd [2242:NtfLogger.cc:0135] T2 IS LOCAL, logging Sep 6 10:18:25.436442 osafntfd [2242:NtfLogger.cc:0166] >> checkQueueAndLog Sep 6 10:18:25.436447 osafntfd [2242:NtfLogger.cc:0196] >> logNotification Sep 6 10:18:25.436452 osafntfd [2242:ntfsv_mem.c:0761] >> ntfsv_get_ntf_header Sep 6 10:18:25.436455 osafntfd [2242:ntfsv_mem.c:0782] << ntfsv_get_ntf_header Sep 6 10:18:25.436460 osafntfd [2242:NtfLogger.cc:0231] T2 Logging notification to alarm stream Sep 6 10:18:25.436495 osafntfd [2242:lga_api.c:1151] >> saLogWriteLogAsync Sep 6 10:18:25.436500 osafntfd [2242:lga_api.c:1015] >> handle_log_record Sep 6 10:18:25.436507 osafntfd [2242:lga_api.c:1110] << handle_log_record Sep 6 10:18:25.436518 osafntfd [2242:lga_api.c:1229] TR **saLogWriteLogAsync Node not CLM member or stale client** Sep 6 10:18:25.436524 osafntfd [2242:lga_api.c:1320] << saLogWriteLogAsync Sep 6 10:18:42.472616 osafntfd [2176:ntfs_main.c:0181] >> initialize --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1993 amf: amfnd crashes during su lock if CSI attribute name or value is a long dn.
- **Milestone**: future --> 5.1.RC1 --- ** [tickets:#1993] amf: amfnd crashes during su lock if CSI attribute name or value is a long dn.** **Status:** accepted **Milestone:** 5.1.RC1 **Created:** Thu Sep 01, 2016 11:09 AM UTC by Praveen **Last Updated:** Thu Sep 01, 2016 11:10 AM UTC **Owner:** Praveen **Attachments:** - [amfnd_crash.tgz](https://sourceforge.net/p/opensaf/tickets/1993/attachment/amfnd_crash.tgz) (69.4 kB; application/x-compressed) Configuration: In the long dn amf demo, add csi attribute for the CSI keeping attribute value a longdn. 1)Bring the configuration up. 2)Lock the SU. 3)AMFND crashes. AMFND uses memcpy() and thus works with orignal csi attribute values from csi_rec. It frees the memory in avsv_amf_cbk_free() when CSI_SET callback arrives. During SU lock, it agian tries to free the memory while deleting the record. At AMFND and AMFD, all SaNameT handling should be done using osaf_extended_name_alloc() API. Issue will be applicable in case of messages related to CSI Attribute change callback also. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1788 cpsv: saCkptCheckpointWrite() returns SA_AIS_ERR_NOT_EXIST after headless state
- **status**: review --> fixed - **Comment**: changeset: 8011:6accddff2419 parent: 8007:661036525753 user:Hoang Vodate:Wed Sep 07 09:18:58 2016 +0530 summary: cpd: To reduce updating time out [#1788] changeset: 8012:723f2cdad674 branch: opensaf-5.1.x parent: 8008:ba9a421fbacf user:Hoang Vo date:Wed Sep 07 09:19:28 2016 +0530 summary: cpd: To reduce updating time out [#1788] changeset: 8013:260bf6c3a621 branch: opensaf-5.0.x tag: tip parent: 8009:a2713c3caf11 user:Hoang Vo date:Wed Sep 07 09:19:52 2016 +0530 summary: cpd: To reduce updating time out [#1788] --- ** [tickets:#1788] cpsv: saCkptCheckpointWrite() returns SA_AIS_ERR_NOT_EXIST after headless state** **Status:** fixed **Milestone:** 5.0.1 **Created:** Thu Apr 28, 2016 02:20 AM UTC by Pham Hoang Nhat **Last Updated:** Fri May 13, 2016 02:12 AM UTC **Owner:** Pham Hoang Nhat The problem happened in the following scenario: 1. Application calls saCkptCheckpointOpen() to create a collocated checkpoint on SC-2. Replica of the checkpoint on SC-2 is active 2. Application calls saCkptCheckpointOpen() to open a collocated checkpoint on PL-5. 3. Application creates section and accesses the checkpoint on PL-5. 4. Both SCs are down. 5. Both SCs are up again. 6. Application accesses the checkpoint with saCkptCheckpointWrite(). The fault code SA_AIS_ERR_NOT_EXIST is return. This problem happened because the osafckptnd process ID on SC-2 before headless and after headless are same. This leads their MDS destination are same. Thus when the SC-2 is up and in short time when CPD hadn't been assigned a new active replica, the application send checkpoint access request to CPND on SC-2 which no longer hosts the active replica. Then it returns SA_AIS_ERR_NOT_EXIST. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1670 cpsv: Checkpoint is destroyed althought there is a user using it
- **status**: review --> fixed - **Comment**: changeset: 8007:661036525753 parent: 8005:36f63cf5aa4d user:Hoang Vodate:Wed Sep 07 09:13:01 2016 +0530 summary: cpsv: CPD starts retention duration timer if the checkpoint is no longer used [#1670] changeset: 8008:ba9a421fbacf branch: opensaf-5.1.x parent: 8006:f8bc9f897235 user:Hoang Vo date:Wed Sep 07 09:13:25 2016 +0530 summary: cpsv: CPD starts retention duration timer if the checkpoint is no longer used [#1670] changeset: 8009:a2713c3caf11 branch: opensaf-5.0.x parent: 8001:3e43cfb7d74f user:Hoang Vo date:Wed Sep 07 09:13:42 2016 +0530 summary: cpsv: CPD starts retention duration timer if the checkpoint is no longer used [#1670] changeset: 8010:1c50d7f77c2a branch: opensaf-4.7.x tag: tip parent: 8002:6b58ec847a47 user:Hoang Vo date:Wed Sep 07 09:14:00 2016 +0530 summary: cpsv: CPD starts retention duration timer if the checkpoint is no longer used [#1670] --- ** [tickets:#1670] cpsv: Checkpoint is destroyed althought there is a user using it** **Status:** fixed **Milestone:** 4.7.2 **Created:** Fri Jan 22, 2016 04:09 AM UTC by Pham Hoang Nhat **Last Updated:** Wed May 04, 2016 05:35 PM UTC **Owner:** Pham Hoang Nhat Problem description: Checkpoint is destroyed althought there is a user using it. Steps to reproduce the problems are: 1. Create a checkpoint on PL3 with flag (creation flag SA_CKPT_WR_ALL_REPLICAS and retention duration = 0) 2. Open this checkpoint on PL4 3. Restart PL3 After step 3. the checkpoint is destroyed although it was using on PL4. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1574 CKPT: Support DNs longer than 255 bytes
Pushed to http://hg.code.sf.net/p/opensaf/documentation changeset: 186:319b3ffccdc0 tag: tip user:Hoang Vodate:Wed Sep 07 08:58:08 2016 +0530 summary: cpsv: update PR document following Long DN extension [#1574] --- ** [tickets:#1574] CKPT: Support DNs longer than 255 bytes** **Status:** fixed **Milestone:** 5.1.FC **Created:** Wed Oct 28, 2015 09:46 AM UTC by Pham Hoang Nhat **Last Updated:** Tue Aug 23, 2016 10:15 AM UTC **Owner:** Pham Hoang Nhat Ticket [#191] introduced generic support in OpenSAF for DNs longer than 255 bytes. Each individual OpenSAF service will also have to be adapted to support long DNs. CKPT should have this feature. The applications may want to use long DNs for checkpoint name. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1962 libplms_hpi.so.0 loading issue - Opensaf with plms services
Try the attached patch instead of yours. If you are building just from the released tar file instead of from source control, you will need to modify Makefile.in in the same directory too. Attachments: - [plm-1962.patch](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/06af13af/02af/attachment/plm-1962.patch) (368 Bytes; text/x-diff) --- ** [tickets:#1962] libplms_hpi.so.0 loading issue - Opensaf with plms services** **Status:** unassigned **Milestone:** 5.0.1 **Created:** Fri Aug 19, 2016 08:09 PM UTC by Subrata Nath **Last Updated:** Mon Sep 05, 2016 02:27 AM UTC **Owner:** nobody Hello, I have installed opensaf 5.0.0 without the PLMS fine for node HA purpose and it's works fine. Now I would like to use opensaf PLMS with the openHPI. My configure script is - ./configure CPPFLAGS=-DRUNASROOT OSAF_HARDEN_FLAGS="-fstack-protector-all -D_FORTIFY_SOURCE=2" HPI_LIBS="-L/usr/local/lib -lopenhpimarshal -lopenhpiutils -lopenhpi" --enable-hpi --with-openhpi --with-hpi-interface=B03 --enable-tipc=yes --enable-imm-pbe=yes --enable-ais-plm --enable-ais-smf --enable-ais-msg --enable-ais-lck --enable-ais-evt --enable-ais-ckpt --enable-ntf-imcn Issues found dueing opensaf start up is -following error message is seen - " ER dlopen() to load libplms_hpi.so failed with error /usr/lib64/opensaf/libplms_hpi.so.0: undefined symbol: plms_plmc_error_cbk" during the .so loading. Is this issue seen before or some configuration issue from my end. As per my understadning this is due to make file issue. Temporarily this issue, i could fix by copying the following three methods )with method name changing) from opensaf-5.0.0/osaf/services/saf/plmsv/plms/plms_plmc.c to opensaf-5.0.0/osaf/services/saf/plmsv/plms/hpi_intf/plms_hsm.c void plms_os_information_free(PLMS_PLMC_EE_OS_INFO *os_info) static SaUint32T plms_os_information_parse(SaInt8T *os_info, PLMS_PLMC_EE_OS_INFO *evt_os_info) int32_t plms_plmc_error_callbk(plmc_lib_error *msg) int32_t plms_plmc_connect_callbk(SaInt8T *ee_id,SaInt8T *msg) int32_t plms_plmc_udp_callbk(udp_msg *msg) Could you please check the make file for the opensaf-5.0.0/osaf/services/saf/plmsv/plms/hpi_intf/ where dependency is there outside of the folder also. Regards, Subrata --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2005 smfd: Inconsistent reading of settings
--- ** [tickets:#2005] smfd: Inconsistent reading of settings** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Tue Sep 06, 2016 02:35 PM UTC by elunlen **Last Updated:** Tue Sep 06, 2016 02:35 PM UTC **Owner:** nobody SMF reads IMM settings and upfdate its cb globals when assigend active, in oi apply callback and after executed init actions in campaign init state. This gives a problem with strange behaviour regarding when IMM settings are updated. Example: Before executing a campaign that has a long campaign name (> 255 characters) longDnsAllowed and smfKeepDuState shall be changed before start executing the campaign. 1. If smfKeepDuState is changed before longDnsAllowed the campaing will fail because cb globals are not updated after change of longDnsAllowed 2. If longDnsAllowed is changed before smfKeepDuState is changed the campaign will succeed because cb will be update with the new longDnsAllowed setting when the OI apply callback is called when smfKeepDuState is changed --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2004 SMF: smfd got crashed when triggered campaign for application upgrade.
Attaching the smfd crash stacktrace as an attachment. Gcc Version: 6.1.0. Attachments: - [smfd_crash.rtf](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/85e4f056/ee2c/attachment/smfd_crash.rtf) (10.1 kB; application/rtf) --- ** [tickets:#2004] SMF: smfd got crashed when triggered campaign for application upgrade.** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Tue Sep 06, 2016 08:35 AM UTC by Madhurika Koppula **Last Updated:** Tue Sep 06, 2016 08:35 AM UTC **Owner:** nobody **Attachments:** - [smf.tgz](https://sourceforge.net/p/opensaf/tickets/2004/attachment/smf.tgz) (1.6 MB; application/octet-stream) **Environment Details:** OS : Suse 64bit Changeset : 7997 ( 5.1.FC) Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 1PBE enabled ). **summary:** smfd got crashed due to segfault on active controller. **Steps followed & Observed behaviour:** Test SGupgrade of 2N model with valid configurations. **Observations:** Active controller went for reboot due to avadown for smfd. Below is the snippet of syslog on active controller: Sep 6 11:52:19 SLES-M-SLOT-1 osafsmfd[3745]: NO SmfProcedureThread::getImmProcedure, IMM data for procedure safSmfProc=amfClusterProc-1,safSmfCampaign=Campaign2,safApp=safSmfService not found Sep 6 11:52:19 SLES-M-SLOT-1 osafimmnd[3661]: NO Implementer connected: 20 (safSmfProc1) <662, 2010f> Sep 6 11:52:19 SLES-M-SLOT-1 osafsmfd[3745]: NO PROC: Start upgrade procedure safSmfProc=amfClusterProc-1 Sep 6 11:52:19 SLES-M-SLOT-1 osafsmfd[3745]: NO PROC: Start procedure init actions Sep 6 11:52:19 SLES-M-SLOT-1 osafamfnd[3726]: NO 'safComp=SMF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' **Sep 6 11:52:19 SLES-M-SLOT-1 osafamfnd[3726]: ER safComp=SMF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast** Sep 6 11:52:19 SLES-M-SLOT-1 osafamfnd[3726]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131343, SupervisionTime = 60 Below is the snippet of osafsmfd trace on active controller: Sep 6 11:52:19.808986 osafsmfd [3745:SmfUpgradeProcedure.cc:0741] TR SmfUpgradeProcedure::calculateRollingSteps:calculateRollingSteps new SW install step added safSmfStep=0003 (with no act/deact unit) for node safAmfNode=PL-4,safAmfCluster=myAmfCluster Sep 6 11:52:19.808995 osafsmfd [3745:SmfUpgradeProcedure.cc:1876] >> addStepModifications Sep 6 11:52:19.809002 osafsmfd [3745:SmfUpgradeProcedure.cc:1931] >> addStepModificationsNode Sep 6 11:52:19.809008 osafsmfd [3745:imma_om_api.c:0160] >> saImmOmInitialize Sep 6 11:52:19.809015 osafsmfd [3745:imma_om_api.c:0186] TR OM client version A.2.1 Sep 6 11:52:19.809021 osafsmfd [3745:imma_om_api.c:0228] >> initialize_common Sep 6 11:52:19.809026 osafsmfd [3745:imma_init.c:0275] >> imma_startup: use count 1 Sep 6 11:52:19.809032 osafsmfd [3745:imma_init.c:0298] << imma_startup: use count 2 Sep 6 11:52:19.809040 osafsmfd [3745:imma_om_api.c:0246] T2 IMMA library syncronous timeout set to:3 Sep 6 11:52:19.809263 osafsmfd [3745:imma_om_api.c:0349] T1 Trying to add OM client id:727 node:2010f Sep 6 11:52:19.809280 osafsmfd [3745:imma_om_api.c:0442] << initialize_common Sep 6 11:52:19.809287 osafsmfd [3745:imma_om_api.c:0214] << saImmOmInitialize Sep 6 11:52:19.809293 osafsmfd [3745:imma_om_api.c:0931] >> saImmOmAdminOwnerInitialize Sep 6 11:52:19.811060 osafsmfd [3745:imma_om_api.c:1143] T1 Admin owner init successful Sep 6 11:52:19.811076 osafsmfd [3745:imma_om_api.c:1144] << saImmOmAdminOwnerInitialize Sep 6 11:52:19.811083 osafsmfd [3745:imma_om_api.c:5528] >> saImmOmAccessorInitialize Sep 6 11:52:19.811091 osafsmfd [3745:imma_om_api.c:5626] << saImmOmAccessorInitialize Sep 6 12:21:09.873661 osafsmfd [2421:ncs_main_pub.c:0220] TR NCS:PROCESS_ID=2421 Attachments: Active Controller: 1)syslog 2)osafsmfd, osafsmfnd traces. 3)osafimmnd traces. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2001 IMM: Owner handle is getting corrupt when OmAdminOperationInvoke retruns ERR_TIMEOUT
- **summary**: IMM: AdminOperation returns BAD_HANDLE when invoked second time --> IMM: Owner handle is getting corrupt when OmAdminOperationInvoke retruns ERR_TIMEOUT - Description has changed: Diff: --- old +++ new @@ -5,7 +5,7 @@ Summary: Steps to Reproduce 1. Invoke saImmOmAdminOperationInvokeAsync_2() while waiting in callback with time more that OI_CALLBACK_TIMEOUT value -2. Invoke saImmOmAdminOperationInvokeAsync_2() again and do not wait +2. Invoke saImmOmAdminOperationInvokeAsync_2() again and do not wait OR Invoke any Ccb operation Observed Bahavior: Step1 will return SA_AIS_ERR_TIMEOUT (Expected) --- ** [tickets:#2001] IMM: Owner handle is getting corrupt when OmAdminOperationInvoke retruns ERR_TIMEOUT** **Status:** unassigned **Milestone:** 5.1.RC1 **Created:** Tue Sep 06, 2016 07:14 AM UTC by Chani Srivastava **Last Updated:** Tue Sep 06, 2016 07:18 AM UTC **Owner:** nobody **Attachments:** - [AdminCbkTmOut.zip](https://sourceforge.net/p/opensaf/tickets/2001/attachment/AdminCbkTmOut.zip) (95.1 kB; application/zip) OS : Suse 64bit Changeset : 7997 ( 5.1.FC) Setup : 4 nodes 1 PBE enabled Summary: Steps to Reproduce 1. Invoke saImmOmAdminOperationInvokeAsync_2() while waiting in callback with time more that OI_CALLBACK_TIMEOUT value 2. Invoke saImmOmAdminOperationInvokeAsync_2() again and do not wait OR Invoke any Ccb operation Observed Bahavior: Step1 will return SA_AIS_ERR_TIMEOUT (Expected) Step2 is returning SA_AIS_ERR_BAD_HANDLE (SA_AIS_OK is expected) Sep 6 12:22:27 SLOT1 python2.5: logtrace: trace enabled to file /tmp/imma_oi_callbacktimeout.trace, mask=0x Sep 6 12:22:27 SLOT1 python2.5: NO IMMA library TRACE initialize done pid:1147 svid:26 file:/tmp/imma_oi_callbacktimeout.trace Sep 6 12:22:27 SLOT1 osafimmnd[838]: NO Implementer connected: 14 (testOiTmout_verifyAdminOpCallback_37) <343, 2010f> Sep 6 12:22:42 SLOT1 osafimmnd[838]: WA IMMND - Client went down so no response Sep 6 12:22:42 SLOT1 osafimmnd[838]: WA MDS Send Failed to service:IMMND rc:2 Sep 6 12:22:42 SLOT1 osafimmnd[838]: ER Problem in sending to peer IMMND over MDS. Discarding admin op reply. Sep 6 12:22:42 SLOT1 osafimmnd[838]: WA Error code 2 returned for message type 21 - ignoring Sep 6 12:22:47 SLOT1 osafimmnd[838]: WA IMMND - Client 1468878946575 went down on syncronous request, discarding request Sep 6 12:22:47 SLOT1 osafimmnd[838]: NO Implementer locally disconnected. Marking it as doomed 14 <343, 2010f> (testOiTmout_verifyAdminOpCallback_37) Sep 6 12:22:47 SLOT1 osafimmnd[838]: NO Implementer disconnected 14 <343, 2010f> (testOiTmout_verifyAdminOpCallback_37) Note: **Test passed in OpenSAF release 5.0** Agent traces and immnd, immd traces attached --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1999 osafntfd on active controller crashed while logging to alarm stream
- **summary**: LOG : ntfd on active controller crashed while logging to alarm stream --> osafntfd on active controller crashed while logging to alarm stream - **Component**: log --> ntf - **Comment**: After the integration of LOG with CLM (#1638), all LOG clients should reinitialize after CLM unlock operation. It might be that , NTF as a LOG client is not reinitializing after CLM unlock and got the return value 31. --- ** [tickets:#1999] osafntfd on active controller crashed while logging to alarm stream** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Tue Sep 06, 2016 05:15 AM UTC by Srikanth R **Last Updated:** Tue Sep 06, 2016 08:09 AM UTC **Owner:** nobody Environment details -- OS : Suse 64bit Changeset : 7997 ( 5.1.FC) Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature disabled & no PBE ) AMF Application : 2N model with SUs mapped on PL-3,PL-4 Summary : -- NTFD crashed on active controller, while logging notification to alarm stream. Steps followed & Observed behaviour -- -> Initially performed couple of switchovers and tests on AMF application. -> Performed CLM lock operation of standby SC-1 and later unlocked. -> Performed switchover such that SC-1 became active controller. -> Stopped opensafd on PL-4. NTFD on active controller crashed. Sep 6 10:18:25 CONTROLLER-1 osafamfd[2262]: NO Node 'PL-4' left the cluster .. Sep 6 10:18:25 CONTROLLER-1 osafntfd[2242]: osaf_abort(31) called from 0x414d1e with errno=11 Sep 6 10:18:25 CONTROLLER-1 osafamfnd[2272]: NO 'safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' -> Below is the excerpt from the ntfd trace. Sep 6 10:18:25.436394 osafntfd [2242:NtfAdmin.cc:0252] T2 New notification received, id: 682 Sep 6 10:18:25.436398 osafntfd [2242:NtfAdmin.cc:0187] >> processNotification Sep 6 10:18:25.436404 osafntfd [2242:NtfNotification.cc:0045] T3 constructor 0x685790, notId: 682 Sep 6 10:18:25.436409 osafntfd [2242:ntfsv_mem.c:0761] >> ntfsv_get_ntf_header Sep 6 10:18:25.436412 osafntfd [2242:ntfsv_mem.c:0782] << ntfsv_get_ntf_header Sep 6 10:18:25.436425 osafntfd [2242:NtfAdmin.cc:0200] T2 notification 682 with type 16384 added, notificationMap size is 1 Sep 6 10:18:25.436431 osafntfd [2242:NtfLogger.cc:0130] >> log Sep 6 10:18:25.436435 osafntfd [2242:NtfLogger.cc:0132] T2 notification Id=682 received in logger with size 0 Sep 6 10:18:25.436439 osafntfd [2242:NtfLogger.cc:0135] T2 IS LOCAL, logging Sep 6 10:18:25.436442 osafntfd [2242:NtfLogger.cc:0166] >> checkQueueAndLog Sep 6 10:18:25.436447 osafntfd [2242:NtfLogger.cc:0196] >> logNotification Sep 6 10:18:25.436452 osafntfd [2242:ntfsv_mem.c:0761] >> ntfsv_get_ntf_header Sep 6 10:18:25.436455 osafntfd [2242:ntfsv_mem.c:0782] << ntfsv_get_ntf_header Sep 6 10:18:25.436460 osafntfd [2242:NtfLogger.cc:0231] T2 Logging notification to alarm stream Sep 6 10:18:25.436495 osafntfd [2242:lga_api.c:1151] >> saLogWriteLogAsync Sep 6 10:18:25.436500 osafntfd [2242:lga_api.c:1015] >> handle_log_record Sep 6 10:18:25.436507 osafntfd [2242:lga_api.c:1110] << handle_log_record Sep 6 10:18:25.436518 osafntfd [2242:lga_api.c:1229] TR **saLogWriteLogAsync Node not CLM member or stale client** Sep 6 10:18:25.436524 osafntfd [2242:lga_api.c:1320] << saLogWriteLogAsync Sep 6 10:18:42.472616 osafntfd [2176:ntfs_main.c:0181] >> initialize --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2003 amf: SG unstable when SU moves to TERM_FAILED state during fresh assignments.
- **status**: accepted --> review --- ** [tickets:#2003] amf: SG unstable when SU moves to TERM_FAILED state during fresh assignments.** **Status:** review **Milestone:** 4.7.2 **Created:** Tue Sep 06, 2016 08:31 AM UTC by Praveen **Last Updated:** Tue Sep 06, 2016 08:32 AM UTC **Owner:** Praveen **Attachments:** - [term_failed.tgz](https://sourceforge.net/p/opensaf/tickets/2003/attachment/term_failed.tgz) (30.1 kB; application/x-compressed) Conf: 2N model, one NPI comp in NPI SU. Steps to reproduce: 1)Add application using immcfg command. 2)Lock SG. 3)Unlock-in and unlock SUs. 4)Make provisions so that instantiation and clean up scripts returns with non-zero status. 5)Unlock SG. When SG is unlocked, AMFND initiates active assignments by instantiating the only component. After instantiation failure, AMFND tries to clean up the component. Cleanup fails. AMFND marks comp and SU in TERM_FAILED state, but it neither responds to AMFD for the completion of assignment nor it sends any recovery request. Because of this SG remains unstable in REALIGN state.In this state, no admin operation is allowed. Attached are traces. Even though issue seems to be similar to #538, it is different in one aspect. In #538, SU moves to TERM_FAILED state and there is possibiltiy of failover/switchover as standby assignments are present. In the present case, it happened during initial assignments and thus there is no standby to switchover/failover to. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2004 SMF: smfd got crashed when triggered campaign for application upgrade.
--- ** [tickets:#2004] SMF: smfd got crashed when triggered campaign for application upgrade.** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Tue Sep 06, 2016 08:35 AM UTC by Madhurika Koppula **Last Updated:** Tue Sep 06, 2016 08:35 AM UTC **Owner:** nobody **Attachments:** - [smf.tgz](https://sourceforge.net/p/opensaf/tickets/2004/attachment/smf.tgz) (1.6 MB; application/octet-stream) **Environment Details:** OS : Suse 64bit Changeset : 7997 ( 5.1.FC) Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 1PBE enabled ). **summary:** smfd got crashed due to segfault on active controller. **Steps followed & Observed behaviour:** Test SGupgrade of 2N model with valid configurations. **Observations:** Active controller went for reboot due to avadown for smfd. Below is the snippet of syslog on active controller: Sep 6 11:52:19 SLES-M-SLOT-1 osafsmfd[3745]: NO SmfProcedureThread::getImmProcedure, IMM data for procedure safSmfProc=amfClusterProc-1,safSmfCampaign=Campaign2,safApp=safSmfService not found Sep 6 11:52:19 SLES-M-SLOT-1 osafimmnd[3661]: NO Implementer connected: 20 (safSmfProc1) <662, 2010f> Sep 6 11:52:19 SLES-M-SLOT-1 osafsmfd[3745]: NO PROC: Start upgrade procedure safSmfProc=amfClusterProc-1 Sep 6 11:52:19 SLES-M-SLOT-1 osafsmfd[3745]: NO PROC: Start procedure init actions Sep 6 11:52:19 SLES-M-SLOT-1 osafamfnd[3726]: NO 'safComp=SMF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' **Sep 6 11:52:19 SLES-M-SLOT-1 osafamfnd[3726]: ER safComp=SMF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast** Sep 6 11:52:19 SLES-M-SLOT-1 osafamfnd[3726]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131343, SupervisionTime = 60 Below is the snippet of osafsmfd trace on active controller: Sep 6 11:52:19.808986 osafsmfd [3745:SmfUpgradeProcedure.cc:0741] TR SmfUpgradeProcedure::calculateRollingSteps:calculateRollingSteps new SW install step added safSmfStep=0003 (with no act/deact unit) for node safAmfNode=PL-4,safAmfCluster=myAmfCluster Sep 6 11:52:19.808995 osafsmfd [3745:SmfUpgradeProcedure.cc:1876] >> addStepModifications Sep 6 11:52:19.809002 osafsmfd [3745:SmfUpgradeProcedure.cc:1931] >> addStepModificationsNode Sep 6 11:52:19.809008 osafsmfd [3745:imma_om_api.c:0160] >> saImmOmInitialize Sep 6 11:52:19.809015 osafsmfd [3745:imma_om_api.c:0186] TR OM client version A.2.1 Sep 6 11:52:19.809021 osafsmfd [3745:imma_om_api.c:0228] >> initialize_common Sep 6 11:52:19.809026 osafsmfd [3745:imma_init.c:0275] >> imma_startup: use count 1 Sep 6 11:52:19.809032 osafsmfd [3745:imma_init.c:0298] << imma_startup: use count 2 Sep 6 11:52:19.809040 osafsmfd [3745:imma_om_api.c:0246] T2 IMMA library syncronous timeout set to:3 Sep 6 11:52:19.809263 osafsmfd [3745:imma_om_api.c:0349] T1 Trying to add OM client id:727 node:2010f Sep 6 11:52:19.809280 osafsmfd [3745:imma_om_api.c:0442] << initialize_common Sep 6 11:52:19.809287 osafsmfd [3745:imma_om_api.c:0214] << saImmOmInitialize Sep 6 11:52:19.809293 osafsmfd [3745:imma_om_api.c:0931] >> saImmOmAdminOwnerInitialize Sep 6 11:52:19.811060 osafsmfd [3745:imma_om_api.c:1143] T1 Admin owner init successful Sep 6 11:52:19.811076 osafsmfd [3745:imma_om_api.c:1144] << saImmOmAdminOwnerInitialize Sep 6 11:52:19.811083 osafsmfd [3745:imma_om_api.c:5528] >> saImmOmAccessorInitialize Sep 6 11:52:19.811091 osafsmfd [3745:imma_om_api.c:5626] << saImmOmAccessorInitialize Sep 6 12:21:09.873661 osafsmfd [2421:ncs_main_pub.c:0220] TR NCS:PROCESS_ID=2421 Attachments: Active Controller: 1)syslog 2)osafsmfd, osafsmfnd traces. 3)osafimmnd traces. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2003 amf: SG unstable when SU moves to TERM_FAILED state during fresh assignments.
- **status**: unassigned --> accepted - **assigned_to**: Praveen - **Component**: unknown --> amf - **Part**: - --> nd --- ** [tickets:#2003] amf: SG unstable when SU moves to TERM_FAILED state during fresh assignments.** **Status:** accepted **Milestone:** 4.7.2 **Created:** Tue Sep 06, 2016 08:31 AM UTC by Praveen **Last Updated:** Tue Sep 06, 2016 08:31 AM UTC **Owner:** Praveen **Attachments:** - [term_failed.tgz](https://sourceforge.net/p/opensaf/tickets/2003/attachment/term_failed.tgz) (30.1 kB; application/x-compressed) Conf: 2N model, one NPI comp in NPI SU. Steps to reproduce: 1)Add application using immcfg command. 2)Lock SG. 3)Unlock-in and unlock SUs. 4)Make provisions so that instantiation and clean up scripts returns with non-zero status. 5)Unlock SG. When SG is unlocked, AMFND initiates active assignments by instantiating the only component. After instantiation failure, AMFND tries to clean up the component. Cleanup fails. AMFND marks comp and SU in TERM_FAILED state, but it neither responds to AMFD for the completion of assignment nor it sends any recovery request. Because of this SG remains unstable in REALIGN state.In this state, no admin operation is allowed. Attached are traces. Even though issue seems to be similar to #538, it is different in one aspect. In #538, SU moves to TERM_FAILED state and there is possibiltiy of failover/switchover as standby assignments are present. In the present case, it happened during initial assignments and thus there is no standby to switchover/failover to. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2003 amf: SG unstable when SU moves to TERM_FAILED state during fresh assignments.
--- ** [tickets:#2003] amf: SG unstable when SU moves to TERM_FAILED state during fresh assignments.** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Tue Sep 06, 2016 08:31 AM UTC by Praveen **Last Updated:** Tue Sep 06, 2016 08:31 AM UTC **Owner:** nobody **Attachments:** - [term_failed.tgz](https://sourceforge.net/p/opensaf/tickets/2003/attachment/term_failed.tgz) (30.1 kB; application/x-compressed) Conf: 2N model, one NPI comp in NPI SU. Steps to reproduce: 1)Add application using immcfg command. 2)Lock SG. 3)Unlock-in and unlock SUs. 4)Make provisions so that instantiation and clean up scripts returns with non-zero status. 5)Unlock SG. When SG is unlocked, AMFND initiates active assignments by instantiating the only component. After instantiation failure, AMFND tries to clean up the component. Cleanup fails. AMFND marks comp and SU in TERM_FAILED state, but it neither responds to AMFD for the completion of assignment nor it sends any recovery request. Because of this SG remains unstable in REALIGN state.In this state, no admin operation is allowed. Attached are traces. Even though issue seems to be similar to #538, it is different in one aspect. In #538, SU moves to TERM_FAILED state and there is possibiltiy of failover/switchover as standby assignments are present. In the present case, it happened during initial assignments and thus there is no standby to switchover/failover to. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1988 AMF: Admin operation continuation does not work with short cluster init timeout
The attr saAmfClusterStartupTimeout currently is set as 10 sec by default. It's only started if all NCS SUs of active controller get assigned. In big clusters, if this timeout is still set as 10secs, when it times out there are still many nodes hasn't joined cluster, many SU out-of-service. AMFD could not start assignment when cluster init timeout. Aug 19 12:32:05.923649 osafamfd [6705:timer.cc:0066] >> avd_start_tmr: 1 Aug 19 12:32:15.987858 osafamfd [6705:cluster.cc:0055] >> avd_cluster_tmr_init_evh Aug 19 12:32:15.988226 osafamfd [6705:sg_2n_fsm.cc:2808] >> realign: 'safSg=2N,safApp=ABC-01' Aug 19 12:32:15.988254 osafamfd [6705:sg_2n_fsm.cc:0606] TR No in service SUs available in the SG Aug 19 12:32:15.988640 osafamfd [6705:sg_2n_fsm.cc:2808] >> realign: 'safSg=2N,safApp=ABC-02' Aug 19 12:32:15.988661 osafamfd [6705:sg_2n_fsm.cc:0606] TR No in service SUs available in the SG However, this does not cause any problem in cluster start-up scenario because AMFD will also start assignment up on receiving avd_su_oper_state_evh() by calling su_insvc(). This happen after a node completes joining cluster. The one joins cluster earlier, the better chance that its SU been assigned active. Also, if all NCS SUs of active controller have not been assigned, the cb state is not INIT_DONE, AMFD will reject node_up msg of all other nodes. In admin operation continuation after headless, AMFD can't do a similiar sequence as above, because the way SU has fresh assignment (su_insvc) is different from SU continues its pending assignment (susi_success). AMFD needs to have all nodes joined cluster before performing a continuation of admin operation. --- ** [tickets:#1988] AMF: Admin operation continuation does not work with short cluster init timeout** **Status:** assigned **Milestone:** 5.1.RC1 **Created:** Wed Aug 31, 2016 12:04 AM UTC by Minh Hon Chau **Last Updated:** Wed Aug 31, 2016 12:04 AM UTC **Owner:** Minh Hon Chau In scenario of admin continuation after headless, if saAmfClusterStartupTimeout configures short value, then the admin continuation will initiate when saAmfClusterStartupTimeout expires but the SU is still in OUT OF SERVICE. The eventual result is failure of admin operation after headless. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2002 CLM : Agent crashed for invalid check in buffer notification parameter
--- ** [tickets:#2002] CLM : Agent crashed for invalid check in buffer notification parameter** **Status:** unassigned **Milestone:** 5.1.RC1 **Created:** Tue Sep 06, 2016 08:15 AM UTC by Srikanth R **Last Updated:** Tue Sep 06, 2016 08:15 AM UTC **Owner:** nobody Environment details -- OS : Suse 64bit Changeset : 7997 ( 5.1.FC) Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature disabled & no PBE ) AMF Application : 2N model with SUs mapped on PL-3,PL-4 Steps followed & Observed behaviour -- -> Call saClmClusterTrack_4 api with CURRENT flag and buffer parameter populated. Here the buffer paramter is populated by allocating suffiicent memory of numberOfItems but notification is having garbage values. Agent crashed with the following back trace, if notification is having garbage values. -> #3 0x7f4ccb370c9f in osaf_extended_name_length (name=0x9d5e4e) at osaf_extended_name.c:139 -> #4 0x7f4cca9ff27c in clma_validate_flags_buf_4 (hdl_rec=0x97cbc0, flags=1 '\001', buf=0x97c190) at clma_api.c:183 ->#5 0x7f4ccaa00fe5 in clmaclustertrack (clmHandle=4290772993, flags=1 '\001', buf=0x0, buf_4=0x97c190) at clma_api.c:1032 ->#6 0x7f4ccaa00d40 in saClmClusterTrack_4 (clmHandle=4290772993, flags=1 '\001', buf=0x97c190) at clma_api.c:958 Expected behaviour -- If the buffer parameter is NULL, CLM shall invoke a callback. If the buffer parameter is not NULL, CLM should check only value of numberOfItems and evaluate whether sufficient memory is allocated by user or not. With the #1906 changes, contents of notification are also verified. But only structure member numberOfItems is to be verified. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1998 amf: protection group track non existing csi returns SA_AIS_ERR_INIT
- **status**: review --> fixed - **Comment**: default: [staging:36f63c] changeset: 8005:36f63cf5aa4d parent: 8003:4dfd86ce806e user:Long Nguyendate:Tue Sep 06 17:10:19 2016 +1000 summary: amfa: fix pg track returns SA_AIS_ERR_INIT [#1998] opensaf-5.1.x: [staging:f8bc9f] changeset: 8006:f8bc9f897235 branch: opensaf-5.1.x tag: tip parent: 8004:a7ed45608a5b user:Long Nguyen date:Tue Sep 06 17:12:58 2016 +1000 summary: amfa: fix pg track returns SA_AIS_ERR_INIT [#1998] --- ** [tickets:#1998] amf: protection group track non existing csi returns SA_AIS_ERR_INIT** **Status:** fixed **Milestone:** 5.1.RC1 **Created:** Mon Sep 05, 2016 07:22 AM UTC by Long HB Nguyen **Last Updated:** Tue Sep 06, 2016 03:02 AM UTC **Owner:** Long HB Nguyen Steps to reproduce -- - Use 2N model. - Modify amf_demo.c as follow: + Initialze amf_demo with saAmfInitialize_4 or saAmfInitialize_o4. + Add a callback for protection group. + Call saAmfProtectionGroupTrack with a non-existing csi (e.g. "dummy" csi), the flag is SA_TRACK_CURRENT and notificationBuffer is NULL. Observed behaviour -- Before the patches for #1553 were pushed, the testcase had returned SA_AIS_ERR_NOT_EXIST return code. After the patches for #1553 were pushed, the testcase has returned SA_AIS_ERR_INIT return code. Initial investigation: -- In the patches for #1553, Praveen added an internal callback structure (OsafAmfCallbacksT): The structure divides protection track callback in two cases: - SaAmfProtectionGroupTrackCallbackT for versions older than B.04.01. - SaAmfProtectionGroupTrackCallbackT_4 for versions from B.04.01. In the case that amf_demo is initialized with callbacks for B.04.01 (i.e. saAmfProtectionGroupTrackCallback_4 is set). When amf_demo call saAmfProtectionGroupTrack, amfa checks saAmfProtectionGroupTrackCallback (it is NULL now). Then, amfa returns SA_AIS_ERR_INIT. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1999 LOG : ntfd on active controller crashed while logging to alarm stream
This may be caused by the bug reported in this ticket [#1985] osaf/services/saf/logsv/lgs/lgs_clm.cc:120]: (error) Uninitialized variable: rc This ticket is on review status. --- ** [tickets:#1999] LOG : ntfd on active controller crashed while logging to alarm stream** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Tue Sep 06, 2016 05:15 AM UTC by Srikanth R **Last Updated:** Tue Sep 06, 2016 05:15 AM UTC **Owner:** nobody Environment details -- OS : Suse 64bit Changeset : 7997 ( 5.1.FC) Setup : 5 nodes ( 2 controllers and 3 payloads with headless feature disabled & no PBE ) AMF Application : 2N model with SUs mapped on PL-3,PL-4 Summary : -- NTFD crashed on active controller, while logging notification to alarm stream. Steps followed & Observed behaviour -- -> Initially performed couple of switchovers and tests on AMF application. -> Performed CLM lock operation of standby SC-1 and later unlocked. -> Performed switchover such that SC-1 became active controller. -> Stopped opensafd on PL-4. NTFD on active controller crashed. Sep 6 10:18:25 CONTROLLER-1 osafamfd[2262]: NO Node 'PL-4' left the cluster .. Sep 6 10:18:25 CONTROLLER-1 osafntfd[2242]: osaf_abort(31) called from 0x414d1e with errno=11 Sep 6 10:18:25 CONTROLLER-1 osafamfnd[2272]: NO 'safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' -> Below is the excerpt from the ntfd trace. Sep 6 10:18:25.436394 osafntfd [2242:NtfAdmin.cc:0252] T2 New notification received, id: 682 Sep 6 10:18:25.436398 osafntfd [2242:NtfAdmin.cc:0187] >> processNotification Sep 6 10:18:25.436404 osafntfd [2242:NtfNotification.cc:0045] T3 constructor 0x685790, notId: 682 Sep 6 10:18:25.436409 osafntfd [2242:ntfsv_mem.c:0761] >> ntfsv_get_ntf_header Sep 6 10:18:25.436412 osafntfd [2242:ntfsv_mem.c:0782] << ntfsv_get_ntf_header Sep 6 10:18:25.436425 osafntfd [2242:NtfAdmin.cc:0200] T2 notification 682 with type 16384 added, notificationMap size is 1 Sep 6 10:18:25.436431 osafntfd [2242:NtfLogger.cc:0130] >> log Sep 6 10:18:25.436435 osafntfd [2242:NtfLogger.cc:0132] T2 notification Id=682 received in logger with size 0 Sep 6 10:18:25.436439 osafntfd [2242:NtfLogger.cc:0135] T2 IS LOCAL, logging Sep 6 10:18:25.436442 osafntfd [2242:NtfLogger.cc:0166] >> checkQueueAndLog Sep 6 10:18:25.436447 osafntfd [2242:NtfLogger.cc:0196] >> logNotification Sep 6 10:18:25.436452 osafntfd [2242:ntfsv_mem.c:0761] >> ntfsv_get_ntf_header Sep 6 10:18:25.436455 osafntfd [2242:ntfsv_mem.c:0782] << ntfsv_get_ntf_header Sep 6 10:18:25.436460 osafntfd [2242:NtfLogger.cc:0231] T2 Logging notification to alarm stream Sep 6 10:18:25.436495 osafntfd [2242:lga_api.c:1151] >> saLogWriteLogAsync Sep 6 10:18:25.436500 osafntfd [2242:lga_api.c:1015] >> handle_log_record Sep 6 10:18:25.436507 osafntfd [2242:lga_api.c:1110] << handle_log_record Sep 6 10:18:25.436518 osafntfd [2242:lga_api.c:1229] TR **saLogWriteLogAsync Node not CLM member or stale client** Sep 6 10:18:25.436524 osafntfd [2242:lga_api.c:1320] << saLogWriteLogAsync Sep 6 10:18:42.472616 osafntfd [2176:ntfs_main.c:0181] >> initialize --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2001 IMM: AdminOperation returns BAD_HANDLE when invoked second time
- Description has changed: Diff: --- old +++ new @@ -11,6 +11,18 @@ Step1 will return SA_AIS_ERR_TIMEOUT (Expected) Step2 is returning SA_AIS_ERR_BAD_HANDLE (SA_AIS_OK is expected) -Note: Test passed in OpenSAF release 5.0 +Sep 6 12:22:27 SLOT1 python2.5: logtrace: trace enabled to file /tmp/imma_oi_callbacktimeout.trace, mask=0x +Sep 6 12:22:27 SLOT1 python2.5: NO IMMA library TRACE initialize done pid:1147 svid:26 file:/tmp/imma_oi_callbacktimeout.trace +Sep 6 12:22:27 SLOT1 osafimmnd[838]: NO Implementer connected: 14 (testOiTmout_verifyAdminOpCallback_37) <343, 2010f> +Sep 6 12:22:42 SLOT1 osafimmnd[838]: WA IMMND - Client went down so no response +Sep 6 12:22:42 SLOT1 osafimmnd[838]: WA MDS Send Failed to service:IMMND rc:2 +Sep 6 12:22:42 SLOT1 osafimmnd[838]: ER Problem in sending to peer IMMND over MDS. Discarding admin op reply. +Sep 6 12:22:42 SLOT1 osafimmnd[838]: WA Error code 2 returned for message type 21 - ignoring +Sep 6 12:22:47 SLOT1 osafimmnd[838]: WA IMMND - Client 1468878946575 went down on syncronous request, discarding request +Sep 6 12:22:47 SLOT1 osafimmnd[838]: NO Implementer locally disconnected. Marking it as doomed 14 <343, 2010f> (testOiTmout_verifyAdminOpCallback_37) +Sep 6 12:22:47 SLOT1 osafimmnd[838]: NO Implementer disconnected 14 <343, 2010f> (testOiTmout_verifyAdminOpCallback_37) + + +Note: **Test passed in OpenSAF release 5.0** Agent traces and immnd, immd traces attached --- ** [tickets:#2001] IMM: AdminOperation returns BAD_HANDLE when invoked second time** **Status:** unassigned **Milestone:** 5.1.RC1 **Created:** Tue Sep 06, 2016 07:14 AM UTC by Chani Srivastava **Last Updated:** Tue Sep 06, 2016 07:14 AM UTC **Owner:** nobody **Attachments:** - [AdminCbkTmOut.zip](https://sourceforge.net/p/opensaf/tickets/2001/attachment/AdminCbkTmOut.zip) (95.1 kB; application/zip) OS : Suse 64bit Changeset : 7997 ( 5.1.FC) Setup : 4 nodes 1 PBE enabled Summary: Steps to Reproduce 1. Invoke saImmOmAdminOperationInvokeAsync_2() while waiting in callback with time more that OI_CALLBACK_TIMEOUT value 2. Invoke saImmOmAdminOperationInvokeAsync_2() again and do not wait Observed Bahavior: Step1 will return SA_AIS_ERR_TIMEOUT (Expected) Step2 is returning SA_AIS_ERR_BAD_HANDLE (SA_AIS_OK is expected) Sep 6 12:22:27 SLOT1 python2.5: logtrace: trace enabled to file /tmp/imma_oi_callbacktimeout.trace, mask=0x Sep 6 12:22:27 SLOT1 python2.5: NO IMMA library TRACE initialize done pid:1147 svid:26 file:/tmp/imma_oi_callbacktimeout.trace Sep 6 12:22:27 SLOT1 osafimmnd[838]: NO Implementer connected: 14 (testOiTmout_verifyAdminOpCallback_37) <343, 2010f> Sep 6 12:22:42 SLOT1 osafimmnd[838]: WA IMMND - Client went down so no response Sep 6 12:22:42 SLOT1 osafimmnd[838]: WA MDS Send Failed to service:IMMND rc:2 Sep 6 12:22:42 SLOT1 osafimmnd[838]: ER Problem in sending to peer IMMND over MDS. Discarding admin op reply. Sep 6 12:22:42 SLOT1 osafimmnd[838]: WA Error code 2 returned for message type 21 - ignoring Sep 6 12:22:47 SLOT1 osafimmnd[838]: WA IMMND - Client 1468878946575 went down on syncronous request, discarding request Sep 6 12:22:47 SLOT1 osafimmnd[838]: NO Implementer locally disconnected. Marking it as doomed 14 <343, 2010f> (testOiTmout_verifyAdminOpCallback_37) Sep 6 12:22:47 SLOT1 osafimmnd[838]: NO Implementer disconnected 14 <343, 2010f> (testOiTmout_verifyAdminOpCallback_37) Note: **Test passed in OpenSAF release 5.0** Agent traces and immnd, immd traces attached --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2001 IMM: AdminOperation returns BAD_HANDLE when invoked second time
--- ** [tickets:#2001] IMM: AdminOperation returns BAD_HANDLE when invoked second time** **Status:** unassigned **Milestone:** 5.1.RC1 **Created:** Tue Sep 06, 2016 07:14 AM UTC by Chani Srivastava **Last Updated:** Tue Sep 06, 2016 07:14 AM UTC **Owner:** nobody **Attachments:** - [AdminCbkTmOut.zip](https://sourceforge.net/p/opensaf/tickets/2001/attachment/AdminCbkTmOut.zip) (95.1 kB; application/zip) OS : Suse 64bit Changeset : 7997 ( 5.1.FC) Setup : 4 nodes 1 PBE enabled Summary: Steps to Reproduce 1. Invoke saImmOmAdminOperationInvokeAsync_2() while waiting in callback with time more that OI_CALLBACK_TIMEOUT value 2. Invoke saImmOmAdminOperationInvokeAsync_2() again and do not wait Observed Bahavior: Step1 will return SA_AIS_ERR_TIMEOUT (Expected) Step2 is returning SA_AIS_ERR_BAD_HANDLE (SA_AIS_OK is expected) Note: Test passed in OpenSAF release 5.0 Agent traces and immnd, immd traces attached --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2000 msg: Cluster reset happend due to msgd crashed on both the controller
--- ** [tickets:#2000] msg: Cluster reset happend due to msgd crashed on both the controller** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Tue Sep 06, 2016 06:04 AM UTC by Ritu Raj **Last Updated:** Tue Sep 06, 2016 06:04 AM UTC **Owner:** nobody **Attachments:** - [Active_syslog](https://sourceforge.net/p/opensaf/tickets/2000/attachment/Active_syslog) (716.7 kB; application/octet-stream) - [Standby_syslog](https://sourceforge.net/p/opensaf/tickets/2000/attachment/Standby_syslog) (696.4 kB; application/octet-stream) Environment details -- OS : Suse 64bit Changeset : 7997 ( 5.1.FC) Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 1PBE enabled with 30K objects ) Summary : -- Cluster reset happend due to assertion SA_MAX_UNEXTENDED_NAME_LENGTH failed in msgd Steps followed & Observed behaviour -- 1. Invoked failover 2. After, few successful failover, New Active Controller rebooted beacuse of Assertion 'length < SA_MAX_UNEXTENDED_NAME_LENGTH' failed in msgd. While previous Active joinig the cluster as a Standby Role resulted cluster reset happend. [Timeline: Sep 6 00:13:02 sofo-s2] Sep 6 00:13:02 sofo-s2 osafimmd[3985]: NO MDS event from svc_id 24 (change:5, dest:13) Sep 6 00:13:02 sofo-s2 osafmsgd[4145]: osaf_extended_name.c:139: osaf_extended_name_length: Assertion 'length < SA_MAX_UNEXTENDED_NAME_LENGTH' failed. Sep 6 00:13:02 sofo-s2 osafamfnd[4046]: NO 'safComp=MQD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' Sep 6 00:13:02 sofo-s2 osafamfnd[4046]: ER safComp=MQD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast Sep 6 00:13:02 sofo-s2 osafamfnd[4046]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131599, SupervisionTime = 60 Sep 6 00:13:02 sofo-s2 opensaf_reboot: Rebooting local node; timeout=60 Notes: 1. Syslog attached 2 msgnd & msgd trace not enabled --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets