[tickets] [opensaf:tickets] #2013 IMM: Search Handle getting corrupt when saImmOmSearchNext_2() returns ERR_TIMEOUT
- **status**: assigned --> wontfix - **Comment**: Sep 8 16:56:06.484169 osafimmnd [3620:immnd_proc.c:1679] T5 tmout:1000 ste:10 ME:5 RE:5 crd:1 rim:FROM_FILE 4.3A:1 2Pbe:0 VetA/B: 0/0 othsc:1/2010f Sep 8 16:56:08.495410 osafimmnd [3620:ImmModel.cc:14077] T5 Timeout on Search continuation 564113889559073 Sep 8 16:56:08.495530 osafimmnd [3620:ImmModel.cc:14200] T5 Timeout on sImplDetachTime implid:0 Sep 8 16:56:08.495606 osafimmnd [3620:immnd_proc.c:1236] T5 Timeout on search op waiting on 1 implementer(s) Sep 8 16:56:08.495634 osafimmnd [3620:ImmModel.cc:12830] >> fetchSearchReqContinuation Sep 8 16:56:08.495660 osafimmnd [3620:ImmModel.cc:12838] T5 REQ SEARCH CONTINUATION 544 FOUND FOR 564113889559073 Sep 8 16:56:08.495687 osafimmnd [3620:ImmModel.cc:12841] << fetchSearchReqContinuation Sep 8 16:56:08.495708 osafimmnd [3620:immnd_evt.c:1029] >> search_req_continue Sep 8 16:56:08.495735 osafimmnd [3620:immnd_evt.c:1044] T2 SEARCH NEXT, Look for id:545 Sep 8 16:56:08.496293 osafimmnd [3620:immnd_evt.c:1195] TR Finalizing search node, err = 5 Sep 8 16:56:08.496337 osafimmnd [3620:ImmModel.cc:1761] TR Deleting iterator searchOp 0x7ebcb0 Sep 8 16:56:08.496367 osafimmnd [3620:immnd_evt.c:1375] >> freeSearchNext Sep 8 16:56:08.496388 osafimmnd [3620:immnd_evt.c:1377] T2 objectName:DistObj1=DistRunTime Sep 8 16:56:08.496416 osafimmnd [3620:immnd_evt.c:1412] << freeSearchNext Sep 8 16:56:08.496434 osafimmnd [3620:immnd_evt.c:1221] << search_req_continue Sep 8 16:56:08.997941 osafimmnd [3620:immsv_evt.c:5422] T8 Received: IMMND_EVT_A2ND_SEARCHNEXT (17) from 2020f Sep 8 16:56:08.998012 osafimmnd [3620:immnd_evt.c:1498] >> immnd_evt_proc_search_next Sep 8 16:56:08.998028 osafimmnd [3620:immnd_evt.c:1509] T2 SEARCH NEXT, Look for id:545 Sep 8 16:56:08.998504 osafimmnd [3620:immnd_evt.c:1520] ER Could not find search node for search-ID:545 Sep 8 16:56:08.999722 osafimmnd [3620:immnd_evt.c:1732] << immnd_evt_proc_search_next 1. There is timeout on search operation while waiting for runtime attributes and the search operation is freed 2. when the search next operation is called again search id is not found and BAD_HANDLE is returned. 3. There is no search handle being corrupt. The code flow has not been chenged from 5.0 --- ** [tickets:#2013] IMM: Search Handle getting corrupt when saImmOmSearchNext_2() returns ERR_TIMEOUT** **Status:** wontfix **Milestone:** 5.1.RC2 **Created:** Thu Sep 08, 2016 12:10 PM UTC by Chani Srivastava **Last Updated:** Tue Sep 13, 2016 10:06 AM UTC **Owner:** Neelakanta Reddy **Attachments:** - [SearchTmOut.zip](https://sourceforge.net/p/opensaf/tickets/2013/attachment/SearchTmOut.zip) (883.9 kB; application/zip) OS : Suse 64bit Changeset : 7997 ( 5.1.FC) Setup : 4 nodes Summary: Steps to Reproduce 1. Create a runtime/config object 2. Do Search Initiliaze() 3. Delete the object created in Step1 4. Do SearchNext() 5. Do SearchNext() again Observed Bahavior: Step4 will return SA_AIS_ERR_TIMEOUT (Expected) Step5 is returning SA_AIS_ERR_BAD_HANDLE** (SA_AIS_ERR_NOT_EXIST is expected)** **Note: Test passed in OpenSAF release 5.0** Agent traces and immnd, immd traces attached --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2033 AMF: Update documentation for admin op continuation after headless
--- ** [tickets:#2033] AMF: Update documentation for admin op continuation after headless** **Status:** assigned **Milestone:** 5.1.RC2 **Created:** Wed Sep 14, 2016 06:09 AM UTC by Minh Hon Chau **Last Updated:** Wed Sep 14, 2016 06:09 AM UTC **Owner:** Minh Hon Chau Need to update README/PR doc for #1987 (fixed) and maybe #1988 (review) --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2014 Rebooted controller not detected in TCP
- **status**: assigned --> review - **Milestone**: 4.7.2 --> 5.0.1 - **Comment**: split-brain is different issue and we have ticket #2030 to debug the split-brain case , so I published the patch of this ticket. --- ** [tickets:#2014] Rebooted controller not detected in TCP** **Status:** review **Milestone:** 5.0.1 **Created:** Thu Sep 08, 2016 06:20 PM UTC by Jonas Arndt **Last Updated:** Tue Sep 13, 2016 04:39 PM UTC **Owner:** A V Mahesh (AVM) **Attachments:** - [logs.tgz](https://sourceforge.net/p/opensaf/tickets/2014/attachment/logs.tgz) (84.1 kB; application/x-compressed-tar) - [tcp_user_timeout_2014.patch](https://sourceforge.net/p/opensaf/tickets/2014/attachment/tcp_user_timeout_2014.patch) (5.5 kB; application/octet-stream) OS environment: Debian Jessie (OpenSAF is running on bare metal, no containers or VMs) 4.4.7 kernel Network eth0, bonded, OVS (I have tried all of them and the problem is there in all configurations) In 20% of the cases a "reboot -f" on controller2 is not detected and acted on. What is in the mds.log is . Sep 7 6:44:23.918566 osafamfd[41365] ERR |MDS_SND_RCV: Adest=<0x,1> Sep 7 6:44:23.918595 osafamfd[41365] ERR |MDS_SND_RCV: Anchor=<0x0002020f,1790> Sep 7 6:44:34.018662 osafamfd[41365] ERR |MDS_SND_RCV: Timeout or Error occured Sep 7 6:44:34.018751 osafamfd[41365] ERR |MDS_SND_RCV: Timeout occured on red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19) Sep 7 6:44:34.018789 osafamfd[41365] ERR |MDS_SND_RCV: Adest=<0x,1> Sep 7 6:44:34.018818 osafamfd[41365] ERR |MDS_SND_RCV: Anchor=<0x0002020f,1790> Sep 7 6:44:44.118832 osafamfd[41365] ERR |MDS_SND_RCV: Timeout or Error occured Sep 7 6:44:44.118919 osafamfd[41365] ERR |MDS_SND_RCV: Timeout occured on red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19) Sep 7 6:44:44.118955 osafamfd[41365] ERR |MDS_SND_RCV: Adest=<0x,1> Sep 7 6:44:44.118984 osafamfd[41365] ERR |MDS_SND_RCV: Anchor=<0x0002020f,1790> Sep 7 6:44:54.218987 osafamfd[41365] ERR |MDS_SND_RCV: Timeout or Error occured Sep 7 6:44:54.219085 osafamfd[41365] ERR |MDS_SND_RCV: Timeout occured on red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19) Sep 7 6:44:54.219139 osafamfd[41365] ERR |MDS_SND_RCV: Adest=<0x,1> Sep 7 6:44:54.219168 osafamfd[41365] ERR |MDS_SND_RCV: Anchor=<0x0002020f,1790> Still, there is nothing in the syslog indicating that controller2 has left the cluster. This is for TCP. When the node comes back on line (without opensaf being started) controller 1 notice finally and fail over apps. When the reboot is not detected the tcp keep alives stops and goes into retransmits instead. I have attached 2 tshark sessions captured from controller1, capturing traffic between controller1 and controller2. The failed reboot detect is captured in "ctrl2_failed_detection.trc" and for a working detection there is a file "ctrl2_working.trc" I have also attached all logs in /var/log/opensaf and the syslog (all from controller one). It appears to me that we are hitting something similar like "http://stackoverflow.com/questions/33553410/tcp-retranmission-timer-overrides-kills-tcp-keepalive-timer-delaying-disconnect"; // Jonas --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2032 ckpt: ckpttest for long dn (5 55, 5 57, 7 12) is failing
--- ** [tickets:#2032] ckpt: ckpttest for long dn (5 55, 5 57, 7 12) is failing** **Status:** unassigned **Milestone:** 5.1.RC2 **Created:** Wed Sep 14, 2016 04:40 AM UTC by Quyen Dao **Last Updated:** Wed Sep 14, 2016 04:40 AM UTC **Owner:** nobody Changeset: 8064:99410ba8cc21 root@SC-1:~# immcfg -a longDnsAllowed=1 opensafImm=opensafImm,safApp=safImmService root@SC-1:~# export SA_ENABLE_EXTENDED_NAMES=1 root@SC-1:~# ckpttest 5 55 Suite 5: CKPT API saCkptCheckpointOpen() 55 FAILED To verify creating a ckpt with invalid extended name length (expected OUT_OF_RANGE, got SA_AIS_OK (1)); = Test Result: Total: 1 Passed: 0 Failed: 1 root@SC-1:~# ckpttest 5 57 Suite 5: CKPT API saCkptCheckpointOpen() 57 FAILED To verify openAsync a ckpt with invalid extended name length (expected OUT_OF_RANGE, got SA_AIS_OK (1)); = Test Result: Total: 1 Passed: 0 Failed: 1 root@SC-1:~# ckpttest 7 12 Suite 7: CKPT API saCkptCheckpointUnlink() 12 FAILED To test unlink a ckpt with invalid extended name (expected OUT_OF_RANGE, got SA_AIS_OK (1)); = Test Result: Total: 1 Passed: 0 Failed: 1 root@SC-1:~# --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1991 AMF: Existing PG tracking should not be stopped for CURRENT flag
- **status**: unassigned --> accepted - **assigned_to**: Long HB Nguyen --- ** [tickets:#1991] AMF: Existing PG tracking should not be stopped for CURRENT flag** **Status:** accepted **Milestone:** 5.1.RC2 **Created:** Wed Aug 31, 2016 09:44 AM UTC by Srikanth R **Last Updated:** Tue Sep 13, 2016 10:09 AM UTC **Owner:** Long HB Nguyen 5.1.FC : changeset - 6997 Issue : Existing PG tracking should not be stopped for CURRENT call Steps performed : -> Call saAmfInitialize_4() -> Call saAmfProtectionGroupTrack_4() with SA_TRACK_CURRENT flag. -> Call saAmfProtectionGroupTrack_4() with SA_TRACK_CHANGES flag. -> Call saAmfProtectionGroupTrack_4() with SA_TRACK_CURRENT flag. -> Call saAmfProtectionGroupTrackStop() Observed output : TrackStop returns ERR_NOT_EXIST, indicating that tracking is not started earlier. Expected output: TrackStop() api should return SA_AIS_OK and in the earlier release, api is returning SA_AIS_OK. According to the B04.01 spec 7.11.1 page 318 , Tracking should not be stopped untill TrackStop() is called explicitly. Once saAmfProtectionGroupTrack_4() has been called with trackFlags containing either SA_TRACK_CHANGES or SA_TRACK_CHANGES_ONLY, notification callbacks can only be stopped by an invocation of saAmfProtectionGroupTrackStop(). --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2030 dtm: "Node already exit in the cluster with smiler configuration"
- **status**: unassigned --> assigned - **assigned_to**: A V Mahesh (AVM) --- ** [tickets:#2030] dtm: "Node already exit in the cluster with smiler configuration"** **Status:** assigned **Milestone:** 4.7.2 **Created:** Tue Sep 13, 2016 12:10 PM UTC by Anders Widell **Last Updated:** Tue Sep 13, 2016 01:01 PM UTC **Owner:** A V Mahesh (AVM) osafdtm does not handle rapid consecutive node reboots properly. I got the following errors in syslog: ~~~ Sep 13 14:00:52 SC-2 local0.err osafdtmd[378]: ER DTM: Node already exit in the cluster with smiler configuration , correct the other joining Node configuration Sep 13 14:01:02 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed .node_ip: 192.168.0.1, node_id: 0 Sep 13 14:01:06 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed .node_ip: 192.168.0.1, node_id: 0 ~~~ Here are the steps to reproduce this problem in UML: ./opensaf start (wait until the cluster comes up) ./opensaf nodestop 2 (wait a few seconds) ./opensaf nodestart 2 ./opensaf nodestart 2 The last two commands should be execute quickly after each other, maybe with one second delay in between them. It seems that osafdtmd asserts and dies when this happens. Here is the result from a second run of the above test: ~~~ Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: ER DTM: Node already exit in the cluster with smiler configuration , correct the other joining Node configuration Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: dtm_node.c:109: dtm_process_node_info: Assertion '0' failed. Sep 13 14:25:58 SC-2 local0.err osafamfd[478]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success Sep 13 14:25:58 SC-2 local0.err osafclmna[468]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success Sep 13 14:25:58 SC-2 local0.err osafclmd[458]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success Sep 13 14:25:58 SC-2 local0.err osafntfd[448]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success Sep 13 14:25:58 SC-2 local0.err osaflogd[437]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success Sep 13 14:25:58 SC-2 local0.err osafimmnd[426]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success Sep 13 14:25:58 SC-2 local0.err osafimmd[415]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success Sep 13 14:25:58 SC-2 local0.err osaffmd[405]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success Sep 13 14:25:58 SC-2 local0.err osafrded[392]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 'SC-1' Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 'PL-4' Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 'PL-5' Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 'PL-3' Sep 13 14:25:59 SC-2 user.notice osafdtmd: osafdtmd Process down, Rebooting the node Sep 13 14:25:59 SC-2 user.notice opensaf_reboot: Rebooting local node; timeout=60 ~~~ Update: it seems I forgot to do "./opensaf nodestop" between the two "./opensaf nodestart" above. Thus, there are probably two SC-2 nodes at the same time, and the error message "Node already exit in the cluster with smiler configuration" should be interpreted as "duplicate node detected in the network". Reducing the priority of this defect to "minor". Still two problems ought to be fixed: the error message should be changed so that it is clear what it means, and osafdtmd should not assert (it could call opensaf_reboot() if a there is a configuration problem, but asserting idicates a software problem). --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1895 ntf: ER in syslog: ER NtfAdmin::subscriptionRemoved client 12 not found
- **Milestone**: 5.1.RC2 --> 4.7.2 --- ** [tickets:#1895] ntf: ER in syslog: ER NtfAdmin::subscriptionRemoved client 12 not found** **Status:** review **Milestone:** 4.7.2 **Created:** Fri Jun 24, 2016 03:24 AM UTC by Vo Minh Hoang **Last Updated:** Tue Sep 13, 2016 10:11 AM UTC **Owner:** Canh Truong **Attachments:** - [osafntfd.txt](https://sourceforge.net/p/opensaf/tickets/1895/attachment/osafntfd.txt) (705.5 kB; text/plain) Failed when run test suit 2: osafntfd [463:ntfs_evt.c:0338] >> proc_unsubscribe_msg: client_id 28, subscriptionId 111 osafntfd [463:NtfAdmin.cc:0553] ER NtfAdmin::subscriptionRemoved client 28 not found osafntfd [463:ntfs_evt.c:0341] << proc_unsubscribe_msg Currently, when finalizing the last client, ntfa uninstall MDS connection. This causes that the NCSMDS_DOWN event will be sent to ntfs. ntfs will remove all clients that relates to this MDS. But if we initializes new client immediately after finalizing, ntfs may reviece the message of initialization before message of NCSMDS_DOWN event. This cause new client will be removed without finalizing and then action subcribe failed. Similiar ticket: https://sourceforge.net/p/opensaf/tickets/1818/ --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2031 imm:README files are missing when opensaf is downloaded
--- ** [tickets:#2031] imm:README files are missing when opensaf is downloaded** **Status:** accepted **Milestone:** 5.0.1 **Created:** Wed Sep 14, 2016 02:40 AM UTC by Neelakanta Reddy **Last Updated:** Wed Sep 14, 2016 02:40 AM UTC **Owner:** Neelakanta Reddy Update the Makefile.am with all README files --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2028 log: write_log_record_hdl get bad file descriptor
- Description has changed: Diff: --- old +++ new @@ -10,11 +10,11 @@ TRACE("%s - stream files initiated", __FUNCTION__); } ``` -In that case - `p_fd = -1`, `log_stream_write_h` should inform the client TRY_AGAIN by returning the value `(-2)`. +In that case - `p_fd = -1`, `log_stream_write_h` should inform the client TRY_AGAIN. Besides, there is an other problem at file closing. Look at the functions `fileclose_hdl` and `fileclose_h`. The file descriptor should be set to `invalid` in `fileclose_hdl`, otherwise `close file` request will re-send to the file handle thread even that file is already closed. -Above cases usually happens when the file sytem is busy. Osaflogd TRACE: +Above cases usually happens when the file sytem is busy. Extract from syslog: > 2016-07-02 00:32:48 SC-1 osaflogd[460]: NO fileclose failed Device or > resource busy > 2016-07-02 00:32:50 SC-1 osaflogd[460]: NO fileclose failed Device or > resource busy --- ** [tickets:#2028] log: write_log_record_hdl get bad file descriptor** **Status:** review **Milestone:** 5.0.1 **Created:** Tue Sep 13, 2016 09:54 AM UTC by Vu Minh Nguyen **Last Updated:** Tue Sep 13, 2016 10:50 AM UTC **Owner:** Vu Minh Nguyen In current code, logsv passes the `WRITE REQUEST` to the handle thread even the file descriptor is invalid. Here is some code of log_stream_write_h()@lgs_stream.cc ``` C log_initiate_stream_files(stream); if (*stream->p_fd == -1) { TRACE("%s - Initiating stream files \"%s\" Failed", __FUNCTION__, stream->name.c_str()); } else { TRACE("%s - stream files initiated", __FUNCTION__); } ``` In that case - `p_fd = -1`, `log_stream_write_h` should inform the client TRY_AGAIN. Besides, there is an other problem at file closing. Look at the functions `fileclose_hdl` and `fileclose_h`. The file descriptor should be set to `invalid` in `fileclose_hdl`, otherwise `close file` request will re-send to the file handle thread even that file is already closed. Above cases usually happens when the file sytem is busy. Extract from syslog: > 2016-07-02 00:32:48 SC-1 osaflogd[460]: NO fileclose failed Device or > resource busy > 2016-07-02 00:32:50 SC-1 osaflogd[460]: NO fileclose failed Device or > resource busy > 2016-07-02 00:32:52 SC-1 osaflogd[460]: ER write_log_record_hdl - write > FAILED: Bad file descriptor --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1988 AMF: Admin operation continuation does not work with short cluster init timeout
- **status**: assigned --> review --- ** [tickets:#1988] AMF: Admin operation continuation does not work with short cluster init timeout** **Status:** review **Milestone:** 5.1.RC2 **Created:** Wed Aug 31, 2016 12:04 AM UTC by Minh Hon Chau **Last Updated:** Tue Sep 13, 2016 11:02 AM UTC **Owner:** Minh Hon Chau In scenario of admin continuation after headless, if saAmfClusterStartupTimeout configures short value, then the admin continuation will initiate when saAmfClusterStartupTimeout expires but the SU is still in OUT OF SERVICE. The eventual result is failure of admin operation after headless. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2001 IMM: Owner handle is getting corrupt when OmAdminOperationInvoke retruns ERR_TIMEOUT
The syslog, imma and immnd traces are not matching. Please provide the correct logs --- ** [tickets:#2001] IMM: Owner handle is getting corrupt when OmAdminOperationInvoke retruns ERR_TIMEOUT** **Status:** assigned **Milestone:** 5.1.RC2 **Created:** Tue Sep 06, 2016 07:14 AM UTC by Chani Srivastava **Last Updated:** Tue Sep 13, 2016 10:06 AM UTC **Owner:** Neelakanta Reddy **Attachments:** - [AdminCbkTmOut.zip](https://sourceforge.net/p/opensaf/tickets/2001/attachment/AdminCbkTmOut.zip) (95.1 kB; application/zip) OS : Suse 64bit Changeset : 7997 ( 5.1.FC) Setup : 4 nodes 1 PBE enabled Summary: Steps to Reproduce 1. Invoke saImmOmAdminOperationInvokeAsync_2() while waiting in callback with time more that OI_CALLBACK_TIMEOUT value 2. Invoke saImmOmAdminOperationInvokeAsync_2() again and do not wait OR Invoke any Ccb operation Observed Bahavior: Step1 will return SA_AIS_ERR_TIMEOUT (Expected) Step2 is returning SA_AIS_ERR_BAD_HANDLE (SA_AIS_OK is expected) Sep 6 12:22:27 SLOT1 python2.5: logtrace: trace enabled to file /tmp/imma_oi_callbacktimeout.trace, mask=0x Sep 6 12:22:27 SLOT1 python2.5: NO IMMA library TRACE initialize done pid:1147 svid:26 file:/tmp/imma_oi_callbacktimeout.trace Sep 6 12:22:27 SLOT1 osafimmnd[838]: NO Implementer connected: 14 (testOiTmout_verifyAdminOpCallback_37) <343, 2010f> Sep 6 12:22:42 SLOT1 osafimmnd[838]: WA IMMND - Client went down so no response Sep 6 12:22:42 SLOT1 osafimmnd[838]: WA MDS Send Failed to service:IMMND rc:2 Sep 6 12:22:42 SLOT1 osafimmnd[838]: ER Problem in sending to peer IMMND over MDS. Discarding admin op reply. Sep 6 12:22:42 SLOT1 osafimmnd[838]: WA Error code 2 returned for message type 21 - ignoring Sep 6 12:22:47 SLOT1 osafimmnd[838]: WA IMMND - Client 1468878946575 went down on syncronous request, discarding request Sep 6 12:22:47 SLOT1 osafimmnd[838]: NO Implementer locally disconnected. Marking it as doomed 14 <343, 2010f> (testOiTmout_verifyAdminOpCallback_37) Sep 6 12:22:47 SLOT1 osafimmnd[838]: NO Implementer disconnected 14 <343, 2010f> (testOiTmout_verifyAdminOpCallback_37) Note: **Test passed in OpenSAF release 5.0** Agent traces and immnd, immd traces attached --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1997 IMM: immnd fails to update si while bringing up opensaf with 2PBE
- **status**: assigned --> unassigned - **assigned_to**: Neelakanta Reddy --> nobody - **Component**: imm --> amf - **Comment**: Sep 2 16:54:13 SLOT1 osafimmpbed: WA Start prepare for ccb: 10004/4294967300 towards slave PBE returned: '12' from Immsv Sep 2 16:54:13 SLOT1 osafimmpbed: WA PBE-A failed to prepare PRTA update Ccb:10004/4294967300 towards PBE-B Sep 2 16:54:13 SLOT1 osafimmpbed: NO 2PBE Error (18) in PRTA update (ccbId:10004) Sep 2 16:54:13 SLOT1 osafimmnd[3632]: WA update of PERSISTENT runtime attributes in object 'safSi=NoRed3,safApp=OpenSAF' REVERTED. PBE rc:18 Sep 2 16:54:13 SLOT1 osafamfd[3698]: ER exec: update FAILED 18 2PBE case, both the PBEs in the controller must be up. From the logs only PBE at slot1 is up and slot2 is not yet joined the cluster. The RT-update will fail, because of slo2 PBE is not available. From, the AMF perspective, this has to be analayzed or Error can be made as Warning for RT-updates. Sep 2 16:54:13 SLOT1 osafamfd[3698]: ER exec: update FAILED 18 --- ** [tickets:#1997] IMM: immnd fails to update si while bringing up opensaf with 2PBE** **Status:** unassigned **Milestone:** 5.1.RC2 **Created:** Fri Sep 02, 2016 11:46 AM UTC by Chani Srivastava **Last Updated:** Tue Sep 13, 2016 10:08 AM UTC **Owner:** nobody **Attachments:** - [LogAMF.zip](https://sourceforge.net/p/opensaf/tickets/1997/attachment/LogAMF.zip) (432.4 kB; application/zip) setup: Version - OpenSAF 5.1.FC : changeset - 7997 4-Node cluster 2PBE enabled Bring up opensaf on a controller with 2PBE enable. IMMND throwing error Attachments: syslog, amfd and immnd traces Sep 2 16:54:13 SLOT1 osafimmpbed: WA Start prepare for ccb: 10004/4294967300 towards slave PBE returned: '12' from Immsv Sep 2 16:54:13 SLOT1 osafimmpbed: WA PBE-A failed to prepare PRTA update Ccb:10004/4294967300 towards PBE-B Sep 2 16:54:13 SLOT1 osafimmpbed: NO 2PBE Error (18) in PRTA update (ccbId:10004) **Sep 2 16:54:13 SLOT1 osafimmnd[3632]: WA update of PERSISTENT runtime attributes in object 'safSi=NoRed3,safApp=OpenSAF' REVERTED. PBE rc:18 Sep 2 16:54:13 SLOT1 osafamfd[3698]: ER exec: update FAILED 18** Sep 2 16:54:14 SLOT1 osafimmnd[3632]: NO PBE-OI established on this SC. Dumping incrementally to file imm.db Note- 1. OpenSAF is successfully started 2. Issue not seen with 1PBE Once controller is up, amf-state si gives safSi=SC-2N,safApp=OpenSAF saAmfSIAdminState=UNLOCKED(1) saAmfSIAssignmentState=PARTIALLY_ASSIGNED(3) safSi=NoRed4,safApp=OpenSAF saAmfSIAdminState=UNLOCKED(1) saAmfSIAssignmentState=UNASSIGNED(1) safSi=NoRed1,safApp=OpenSAF saAmfSIAdminState=UNLOCKED(1) saAmfSIAssignmentState=FULLY_ASSIGNED(2) safSi=NoRed2,safApp=OpenSAF saAmfSIAdminState=UNLOCKED(1) saAmfSIAssignmentState=UNASSIGNED(1) safSi=NoRed3,safApp=OpenSAF saAmfSIAdminState=UNLOCKED(1) saAmfSIAssignmentState=UNASSIGNED(1) --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2014 Rebooted controller not detected in TCP
Anders, it is possible. I am seeing the same entry in my system when I get the split-brain. After I fixed the MAC in OVS the problem went away though. --- ** [tickets:#2014] Rebooted controller not detected in TCP** **Status:** assigned **Milestone:** 4.7.2 **Created:** Thu Sep 08, 2016 06:20 PM UTC by Jonas Arndt **Last Updated:** Tue Sep 13, 2016 12:14 PM UTC **Owner:** A V Mahesh (AVM) **Attachments:** - [logs.tgz](https://sourceforge.net/p/opensaf/tickets/2014/attachment/logs.tgz) (84.1 kB; application/x-compressed-tar) - [tcp_user_timeout_2014.patch](https://sourceforge.net/p/opensaf/tickets/2014/attachment/tcp_user_timeout_2014.patch) (5.5 kB; application/octet-stream) OS environment: Debian Jessie (OpenSAF is running on bare metal, no containers or VMs) 4.4.7 kernel Network eth0, bonded, OVS (I have tried all of them and the problem is there in all configurations) In 20% of the cases a "reboot -f" on controller2 is not detected and acted on. What is in the mds.log is . Sep 7 6:44:23.918566 osafamfd[41365] ERR |MDS_SND_RCV: Adest=<0x,1> Sep 7 6:44:23.918595 osafamfd[41365] ERR |MDS_SND_RCV: Anchor=<0x0002020f,1790> Sep 7 6:44:34.018662 osafamfd[41365] ERR |MDS_SND_RCV: Timeout or Error occured Sep 7 6:44:34.018751 osafamfd[41365] ERR |MDS_SND_RCV: Timeout occured on red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19) Sep 7 6:44:34.018789 osafamfd[41365] ERR |MDS_SND_RCV: Adest=<0x,1> Sep 7 6:44:34.018818 osafamfd[41365] ERR |MDS_SND_RCV: Anchor=<0x0002020f,1790> Sep 7 6:44:44.118832 osafamfd[41365] ERR |MDS_SND_RCV: Timeout or Error occured Sep 7 6:44:44.118919 osafamfd[41365] ERR |MDS_SND_RCV: Timeout occured on red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19) Sep 7 6:44:44.118955 osafamfd[41365] ERR |MDS_SND_RCV: Adest=<0x,1> Sep 7 6:44:44.118984 osafamfd[41365] ERR |MDS_SND_RCV: Anchor=<0x0002020f,1790> Sep 7 6:44:54.218987 osafamfd[41365] ERR |MDS_SND_RCV: Timeout or Error occured Sep 7 6:44:54.219085 osafamfd[41365] ERR |MDS_SND_RCV: Timeout occured on red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19) Sep 7 6:44:54.219139 osafamfd[41365] ERR |MDS_SND_RCV: Adest=<0x,1> Sep 7 6:44:54.219168 osafamfd[41365] ERR |MDS_SND_RCV: Anchor=<0x0002020f,1790> Still, there is nothing in the syslog indicating that controller2 has left the cluster. This is for TCP. When the node comes back on line (without opensaf being started) controller 1 notice finally and fail over apps. When the reboot is not detected the tcp keep alives stops and goes into retransmits instead. I have attached 2 tshark sessions captured from controller1, capturing traffic between controller1 and controller2. The failed reboot detect is captured in "ctrl2_failed_detection.trc" and for a working detection there is a file "ctrl2_working.trc" I have also attached all logs in /var/log/opensaf and the syslog (all from controller one). It appears to me that we are hitting something similar like "http://stackoverflow.com/questions/33553410/tcp-retranmission-timer-overrides-kills-tcp-keepalive-timer-delaying-disconnect"; // Jonas --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2030 dtm: "Node already exit in the cluster with smiler configuration"
- Description has changed: Diff: --- old +++ new @@ -39,3 +39,6 @@ Sep 13 14:25:59 SC-2 user.notice opensaf_reboot: Rebooting local node; timeout=60 ~~~ + +Update: it seems I forgot to do "./opensaf nodestop" between the two "./opensaf nodestart" above. Thus, there are probably two SC-2 nodes at the same time, and the error message "Node already exit in the cluster with smiler configuration" should be interpreted as "duplicate node detected in the network". Reducing the priority of this defect to "minor". Still two problems ought to be fixed: the error message should be changed so that it is clear what it means, and osafdtmd should not assert (it could call opensaf_reboot() if a there is a configuration problem, but asserting idicates a software problem). + - **Priority**: major --> minor --- ** [tickets:#2030] dtm: "Node already exit in the cluster with smiler configuration"** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Tue Sep 13, 2016 12:10 PM UTC by Anders Widell **Last Updated:** Tue Sep 13, 2016 12:30 PM UTC **Owner:** nobody osafdtm does not handle rapid consecutive node reboots properly. I got the following errors in syslog: ~~~ Sep 13 14:00:52 SC-2 local0.err osafdtmd[378]: ER DTM: Node already exit in the cluster with smiler configuration , correct the other joining Node configuration Sep 13 14:01:02 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed .node_ip: 192.168.0.1, node_id: 0 Sep 13 14:01:06 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed .node_ip: 192.168.0.1, node_id: 0 ~~~ Here are the steps to reproduce this problem in UML: ./opensaf start (wait until the cluster comes up) ./opensaf nodestop 2 (wait a few seconds) ./opensaf nodestart 2 ./opensaf nodestart 2 The last two commands should be execute quickly after each other, maybe with one second delay in between them. It seems that osafdtmd asserts and dies when this happens. Here is the result from a second run of the above test: ~~~ Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: ER DTM: Node already exit in the cluster with smiler configuration , correct the other joining Node configuration Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: dtm_node.c:109: dtm_process_node_info: Assertion '0' failed. Sep 13 14:25:58 SC-2 local0.err osafamfd[478]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success Sep 13 14:25:58 SC-2 local0.err osafclmna[468]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success Sep 13 14:25:58 SC-2 local0.err osafclmd[458]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success Sep 13 14:25:58 SC-2 local0.err osafntfd[448]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success Sep 13 14:25:58 SC-2 local0.err osaflogd[437]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success Sep 13 14:25:58 SC-2 local0.err osafimmnd[426]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success Sep 13 14:25:58 SC-2 local0.err osafimmd[415]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success Sep 13 14:25:58 SC-2 local0.err osaffmd[405]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success Sep 13 14:25:58 SC-2 local0.err osafrded[392]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 'SC-1' Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 'PL-4' Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 'PL-5' Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 'PL-3' Sep 13 14:25:59 SC-2 user.notice osafdtmd: osafdtmd Process down, Rebooting the node Sep 13 14:25:59 SC-2 user.notice opensaf_reboot: Rebooting local node; timeout=60 ~~~ Update: it seems I forgot to do "./opensaf nodestop" between the two "./opensaf nodestart" above. Thus, there are probably two SC-2 nodes at the same time, and the error message "Node already exit in the cluster with smiler configuration" should be interpreted as "duplicate node detected in the network". Reducing the priority of this defect to "minor". Still two problems ought to be fixed: the error message should be changed so that it is clear what it means, and osafdtmd should not assert (it could call opensaf_reboot() if a there is a configuration problem, but asserting idicates a software problem). --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-
[tickets] [opensaf:tickets] #2030 dtm: "Node already exit in the cluster with smiler configuration"
- Description has changed: Diff: --- old +++ new @@ -1,9 +1,9 @@ osafdtm does not handle rapid consecutive node reboots properly. I got the following errors in syslog: ~~~ -var/SC-2/log/messages:Sep 13 14:00:52 SC-2 local0.err osafdtmd[378]: ER DTM: Node already exit in the cluster with smiler configuration , correct the other joining Node configuration -var/SC-2/log/messages:Sep 13 14:01:02 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed .node_ip: 192.168.0.1, node_id: 0 -var/SC-2/log/messages:Sep 13 14:01:06 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed .node_ip: 192.168.0.1, node_id: 0 +Sep 13 14:00:52 SC-2 local0.err osafdtmd[378]: ER DTM: Node already exit in the cluster with smiler configuration , correct the other joining Node configuration +Sep 13 14:01:02 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed .node_ip: 192.168.0.1, node_id: 0 +Sep 13 14:01:06 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed .node_ip: 192.168.0.1, node_id: 0 ~~~ Here are the steps to reproduce this problem in UML: @@ -16,3 +16,26 @@ ./opensaf nodestart 2 The last two commands should be execute quickly after each other, maybe with one second delay in between them. + +It seems that osafdtmd asserts and dies when this happens. Here is the result from a second run of the above test: + +~~~ +Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: ER DTM: Node already exit in the cluster with smiler configuration , correct the other joining Node configuration +Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: dtm_node.c:109: dtm_process_node_info: Assertion '0' failed. +Sep 13 14:25:58 SC-2 local0.err osafamfd[478]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success +Sep 13 14:25:58 SC-2 local0.err osafclmna[468]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success +Sep 13 14:25:58 SC-2 local0.err osafclmd[458]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success +Sep 13 14:25:58 SC-2 local0.err osafntfd[448]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success +Sep 13 14:25:58 SC-2 local0.err osaflogd[437]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success +Sep 13 14:25:58 SC-2 local0.err osafimmnd[426]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success +Sep 13 14:25:58 SC-2 local0.err osafimmd[415]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success +Sep 13 14:25:58 SC-2 local0.err osaffmd[405]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success +Sep 13 14:25:58 SC-2 local0.err osafrded[392]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success +Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 'SC-1' +Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 'PL-4' +Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 'PL-5' +Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 'PL-3' +Sep 13 14:25:59 SC-2 user.notice osafdtmd: osafdtmd Process down, Rebooting the node +Sep 13 14:25:59 SC-2 user.notice opensaf_reboot: Rebooting local node; timeout=60 + +~~~ --- ** [tickets:#2030] dtm: "Node already exit in the cluster with smiler configuration"** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Tue Sep 13, 2016 12:10 PM UTC by Anders Widell **Last Updated:** Tue Sep 13, 2016 12:17 PM UTC **Owner:** nobody osafdtm does not handle rapid consecutive node reboots properly. I got the following errors in syslog: ~~~ Sep 13 14:00:52 SC-2 local0.err osafdtmd[378]: ER DTM: Node already exit in the cluster with smiler configuration , correct the other joining Node configuration Sep 13 14:01:02 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed .node_ip: 192.168.0.1, node_id: 0 Sep 13 14:01:06 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed .node_ip: 192.168.0.1, node_id: 0 ~~~ Here are the steps to reproduce this problem in UML: ./opensaf start (wait until the cluster comes up) ./opensaf nodestop 2 (wait a few seconds) ./opensaf nodestart 2 ./opensaf nodestart 2 The last two commands should be execute quickly after each other, maybe with one second delay in between them. It seems that osafdtmd asserts and dies when this happens. Here is the result from a second run of the above test: ~~~ Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: ER DTM: Node already exit in the cluster with smiler configuration , correct the other joining Node configuration Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: dtm_node.c:109: dtm_process_node_info: Assertion '0' failed. Sep 13 14:25:58 SC-2 local0.err osafamfd[478]: MDTM:SOCKET recd_bytes :0, conn lost with dh server, exiting library err :Success Sep 13 14:25:58 SC-2 local0.err osafclmna[
[tickets] [opensaf:tickets] #2030 dtm: "Node already exit in the cluster with smiler configuration"
Needless to say, the error message itself is also faulty here. I suppose "exit" should be "exists", and "smiler" should be "similar"? I am just guessing... :-) --- ** [tickets:#2030] dtm: "Node already exit in the cluster with smiler configuration"** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Tue Sep 13, 2016 12:10 PM UTC by Anders Widell **Last Updated:** Tue Sep 13, 2016 12:10 PM UTC **Owner:** nobody osafdtm does not handle rapid consecutive node reboots properly. I got the following errors in syslog: ~~~ var/SC-2/log/messages:Sep 13 14:00:52 SC-2 local0.err osafdtmd[378]: ER DTM: Node already exit in the cluster with smiler configuration , correct the other joining Node configuration var/SC-2/log/messages:Sep 13 14:01:02 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed .node_ip: 192.168.0.1, node_id: 0 var/SC-2/log/messages:Sep 13 14:01:06 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed .node_ip: 192.168.0.1, node_id: 0 ~~~ Here are the steps to reproduce this problem in UML: ./opensaf start (wait until the cluster comes up) ./opensaf nodestop 2 (wait a few seconds) ./opensaf nodestart 2 ./opensaf nodestart 2 The last two commands should be execute quickly after each other, maybe with one second delay in between them. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2014 Rebooted controller not detected in TCP
Maybe your split-brain problems could be related to the ticket [#2030] that I just filed on DTM? --- ** [tickets:#2014] Rebooted controller not detected in TCP** **Status:** assigned **Milestone:** 4.7.2 **Created:** Thu Sep 08, 2016 06:20 PM UTC by Jonas Arndt **Last Updated:** Tue Sep 13, 2016 12:13 PM UTC **Owner:** A V Mahesh (AVM) **Attachments:** - [logs.tgz](https://sourceforge.net/p/opensaf/tickets/2014/attachment/logs.tgz) (84.1 kB; application/x-compressed-tar) - [tcp_user_timeout_2014.patch](https://sourceforge.net/p/opensaf/tickets/2014/attachment/tcp_user_timeout_2014.patch) (5.5 kB; application/octet-stream) OS environment: Debian Jessie (OpenSAF is running on bare metal, no containers or VMs) 4.4.7 kernel Network eth0, bonded, OVS (I have tried all of them and the problem is there in all configurations) In 20% of the cases a "reboot -f" on controller2 is not detected and acted on. What is in the mds.log is . Sep 7 6:44:23.918566 osafamfd[41365] ERR |MDS_SND_RCV: Adest=<0x,1> Sep 7 6:44:23.918595 osafamfd[41365] ERR |MDS_SND_RCV: Anchor=<0x0002020f,1790> Sep 7 6:44:34.018662 osafamfd[41365] ERR |MDS_SND_RCV: Timeout or Error occured Sep 7 6:44:34.018751 osafamfd[41365] ERR |MDS_SND_RCV: Timeout occured on red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19) Sep 7 6:44:34.018789 osafamfd[41365] ERR |MDS_SND_RCV: Adest=<0x,1> Sep 7 6:44:34.018818 osafamfd[41365] ERR |MDS_SND_RCV: Anchor=<0x0002020f,1790> Sep 7 6:44:44.118832 osafamfd[41365] ERR |MDS_SND_RCV: Timeout or Error occured Sep 7 6:44:44.118919 osafamfd[41365] ERR |MDS_SND_RCV: Timeout occured on red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19) Sep 7 6:44:44.118955 osafamfd[41365] ERR |MDS_SND_RCV: Adest=<0x,1> Sep 7 6:44:44.118984 osafamfd[41365] ERR |MDS_SND_RCV: Anchor=<0x0002020f,1790> Sep 7 6:44:54.218987 osafamfd[41365] ERR |MDS_SND_RCV: Timeout or Error occured Sep 7 6:44:54.219085 osafamfd[41365] ERR |MDS_SND_RCV: Timeout occured on red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19) Sep 7 6:44:54.219139 osafamfd[41365] ERR |MDS_SND_RCV: Adest=<0x,1> Sep 7 6:44:54.219168 osafamfd[41365] ERR |MDS_SND_RCV: Anchor=<0x0002020f,1790> Still, there is nothing in the syslog indicating that controller2 has left the cluster. This is for TCP. When the node comes back on line (without opensaf being started) controller 1 notice finally and fail over apps. When the reboot is not detected the tcp keep alives stops and goes into retransmits instead. I have attached 2 tshark sessions captured from controller1, capturing traffic between controller1 and controller2. The failed reboot detect is captured in "ctrl2_failed_detection.trc" and for a working detection there is a file "ctrl2_working.trc" I have also attached all logs in /var/log/opensaf and the syslog (all from controller one). It appears to me that we are hitting something similar like "http://stackoverflow.com/questions/33553410/tcp-retranmission-timer-overrides-kills-tcp-keepalive-timer-delaying-disconnect"; // Jonas --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2014 Rebooted controller not detected in TCP
I actually need to do more tests. From the patch's point of view I think it is looking good. The split brain seems to be related to that OVS is bringing up the port with a new MAC address every time. I have run some tests on eth0 (without OVS) and not been able to reproduce the split brain. Note that with TIPC as a transport the split brain also never happens even with OVS. I will run some more tests today and get back with some conclusion. The split brain is coming after "reboot -f" on controller2 when it tries to join the cluster after coming up after the reboot. After that the two controllers run next to each other both active and there is no reboot. The detection of reboot seems to always be there now, so the patch definitely fixed that. --- ** [tickets:#2014] Rebooted controller not detected in TCP** **Status:** assigned **Milestone:** 4.7.2 **Created:** Thu Sep 08, 2016 06:20 PM UTC by Jonas Arndt **Last Updated:** Tue Sep 13, 2016 04:25 AM UTC **Owner:** A V Mahesh (AVM) **Attachments:** - [logs.tgz](https://sourceforge.net/p/opensaf/tickets/2014/attachment/logs.tgz) (84.1 kB; application/x-compressed-tar) - [tcp_user_timeout_2014.patch](https://sourceforge.net/p/opensaf/tickets/2014/attachment/tcp_user_timeout_2014.patch) (5.5 kB; application/octet-stream) OS environment: Debian Jessie (OpenSAF is running on bare metal, no containers or VMs) 4.4.7 kernel Network eth0, bonded, OVS (I have tried all of them and the problem is there in all configurations) In 20% of the cases a "reboot -f" on controller2 is not detected and acted on. What is in the mds.log is . Sep 7 6:44:23.918566 osafamfd[41365] ERR |MDS_SND_RCV: Adest=<0x,1> Sep 7 6:44:23.918595 osafamfd[41365] ERR |MDS_SND_RCV: Anchor=<0x0002020f,1790> Sep 7 6:44:34.018662 osafamfd[41365] ERR |MDS_SND_RCV: Timeout or Error occured Sep 7 6:44:34.018751 osafamfd[41365] ERR |MDS_SND_RCV: Timeout occured on red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19) Sep 7 6:44:34.018789 osafamfd[41365] ERR |MDS_SND_RCV: Adest=<0x,1> Sep 7 6:44:34.018818 osafamfd[41365] ERR |MDS_SND_RCV: Anchor=<0x0002020f,1790> Sep 7 6:44:44.118832 osafamfd[41365] ERR |MDS_SND_RCV: Timeout or Error occured Sep 7 6:44:44.118919 osafamfd[41365] ERR |MDS_SND_RCV: Timeout occured on red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19) Sep 7 6:44:44.118955 osafamfd[41365] ERR |MDS_SND_RCV: Adest=<0x,1> Sep 7 6:44:44.118984 osafamfd[41365] ERR |MDS_SND_RCV: Anchor=<0x0002020f,1790> Sep 7 6:44:54.218987 osafamfd[41365] ERR |MDS_SND_RCV: Timeout or Error occured Sep 7 6:44:54.219085 osafamfd[41365] ERR |MDS_SND_RCV: Timeout occured on red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19) Sep 7 6:44:54.219139 osafamfd[41365] ERR |MDS_SND_RCV: Adest=<0x,1> Sep 7 6:44:54.219168 osafamfd[41365] ERR |MDS_SND_RCV: Anchor=<0x0002020f,1790> Still, there is nothing in the syslog indicating that controller2 has left the cluster. This is for TCP. When the node comes back on line (without opensaf being started) controller 1 notice finally and fail over apps. When the reboot is not detected the tcp keep alives stops and goes into retransmits instead. I have attached 2 tshark sessions captured from controller1, capturing traffic between controller1 and controller2. The failed reboot detect is captured in "ctrl2_failed_detection.trc" and for a working detection there is a file "ctrl2_working.trc" I have also attached all logs in /var/log/opensaf and the syslog (all from controller one). It appears to me that we are hitting something similar like "http://stackoverflow.com/questions/33553410/tcp-retranmission-timer-overrides-kills-tcp-keepalive-timer-delaying-disconnect"; // Jonas --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2030 dtm: "Node already exit in the cluster with smiler configuration"
--- ** [tickets:#2030] dtm: "Node already exit in the cluster with smiler configuration"** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Tue Sep 13, 2016 12:10 PM UTC by Anders Widell **Last Updated:** Tue Sep 13, 2016 12:10 PM UTC **Owner:** nobody osafdtm does not handle rapid consecutive node reboots properly. I got the following errors in syslog: ~~~ var/SC-2/log/messages:Sep 13 14:00:52 SC-2 local0.err osafdtmd[378]: ER DTM: Node already exit in the cluster with smiler configuration , correct the other joining Node configuration var/SC-2/log/messages:Sep 13 14:01:02 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed .node_ip: 192.168.0.1, node_id: 0 var/SC-2/log/messages:Sep 13 14:01:06 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed .node_ip: 192.168.0.1, node_id: 0 ~~~ Here are the steps to reproduce this problem in UML: ./opensaf start (wait until the cluster comes up) ./opensaf nodestop 2 (wait a few seconds) ./opensaf nodestart 2 ./opensaf nodestart 2 The last two commands should be execute quickly after each other, maybe with one second delay in between them. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1816 IMM: saImmOiAugmentCcbInitialize returned ERR_TRY_AGAIN when ERR_LIBRARY was expected
- **status**: accepted --> review --- ** [tickets:#1816] IMM: saImmOiAugmentCcbInitialize returned ERR_TRY_AGAIN when ERR_LIBRARY was expected** **Status:** review **Milestone:** 4.7.2 **Created:** Mon May 09, 2016 07:27 AM UTC by Chani Srivastava **Last Updated:** Mon May 09, 2016 07:29 AM UTC **Owner:** Neelakanta Reddy This was found as part of validating ticket #1808 Code snippet: imma_oi_api.c:3749 ~~~ if(immsv_om_handle_initialize) {/*This is always the first immsv_om_ call */ rc = immsv_om_handle_initialize(&privateOmHandle, &version); } else { TRACE("ERR_LIBRARY: Error in library linkage. libSaImmOm.so is not linked"); rc = SA_AIS_ERR_LIBRARY; } if(rc != SA_AIS_OK) { TRACE("ERR_TRY_AGAIN: failed to obtain internal om handle rc:%u", rc); rc = SA_AIS_ERR_TRY_AGAIN; goto lock_fail; /* We are not locked and nothing to de-allocate. */ } ~~~ When rc is set to SA_AIS_ERR_LIBRARY, there is no goto and hence next if condition is executed which sets rc SA_AIS_ERR_TRY_AGAIN --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1986 log: logtest fails when run after immomtest
- **status**: accepted --> review --- ** [tickets:#1986] log: logtest fails when run after immomtest** **Status:** review **Milestone:** 5.1.RC2 **Created:** Tue Aug 30, 2016 09:03 AM UTC by Anders Widell **Last Updated:** Tue Sep 13, 2016 10:10 AM UTC **Owner:** Vu Minh Nguyen If I first run immomtest and then logtest, I get the following result: ~~~ Suite 1: Library Life Cycle 1 PASSED saLogInitialize() OK[0m; 2 PASSED saLogInitialize() with NULL pointer to handle[0m; 3 PASSED saLogInitialize() with NULL pointer to callbacks[0m; 4 PASSED saLogInitialize() with NULL callbacks AND version[0m; 5 PASSED saLogInitialize() with uninitialized handle[0m; 6 PASSED saLogInitialize() with uninitialized version[0m; 7 PASSED saLogInitialize() with too high release level[0m; 8 PASSED saLogInitialize() with minor version set to 1[0m; 9 PASSED saLogInitialize() with major version set to 3[0m; 10 PASSED saLogSelectionObjectGet() OK[0m; 11 PASSED saLogSelectionObjectGet() with NULL log handle[0m; 12 PASSED saLogDispatch() OK[0m; 13 PASSED saLogFinalize() OK[0m; 14 PASSED saLogFinalize() with NULL log handle[0m; Suite 2: Log Service Operations 1 PASSED saLogStreamOpen_2() system stream OK[0m; 2 PASSED saLogStreamOpen_2() notification stream OK[0m; 3 PASSED saLogStreamOpen_2() alarm stream OK[0m; 4 PASSED Create app stream OK[0m; 5 PASSED Create and open app stream[0m; 6 PASSED saLogStreamOpen_2() - NULL ptr to handle[0m; 7 PASSED saLogStreamOpen_2() - NULL logStreamName[0m; 8 PASSED Open app stream second time with altered logFileName[0m; 9 PASSED Open app stream second time with altered logFilePathName[0m; 10 PASSED Open app stream second time with altered logFileFmt[0m; 11 PASSED Open app stream second time with altered maxLogFileSize[0m; 12 PASSED Open app stream second time with altered maxLogRecordSize[0m; 13 PASSED Open app stream second time with altered maxFilesRotated[0m; 14 PASSED Open app stream second time with altered haProperty[0m; 15 PASSED Open app with logFileFmt == NULL[0m; 16 PASSED Open app stream second time with logFileFmt == NULL[0m; 17 PASSED Open app stream with NULL logFilePathName[0m; 18 PASSED Open app stream with '.' logFilePathName[0m; 19 PASSED Open app stream with invalid logFileFmt[0m; 20 PASSED Open app stream with unsupported logFullAction[0m; 21 PASSED Open non exist app stream with NULL create attrs[0m; 22 PASSED saLogStreamOpenAsync_2(), Not supported[0m; 23 PASSED saLogStreamOpenCallbackT() OK[0m; 24 PASSED saLogWriteLog(), Not supported[0m; 25 PASSED saLogWriteAsyncLog() system OK[0m; 26 PASSED saLogWriteAsyncLog() alarm OK[0m; 27 PASSED saLogWriteAsyncLog() notification OK[0m; 28 PASSED saLogWriteAsyncLog() with NULL logStreamHandle[0m; 29 PASSED saLogWriteAsyncLog() with invalid logStreamHandle[0m; 30 PASSED saLogWriteAsyncLog() with invalid ackFlags[0m; 31 PASSED saLogWriteAsyncLog() with NULL logRecord ptr[0m; 32 PASSED saLogWriteAsyncLog() logSvcUsrName == NULL[0m; 33 PASSED saLogWriteAsyncLog() logSvcUsrName == NULL and envset[0m; 34 PASSED saLogWriteAsyncLog() with logTimeStamp set[0m; 35 PASSED saLogWriteAsyncLog() without logTimeStamp set[0m; 36 PASSED saLogWriteAsyncLog() 1800 bytes logrecord (ticket #203)[0m; 37 PASSED saLogWriteAsyncLog() invalid severity[0m; 38 PASSED saLogWriteLogAsync() logBufSize > strlen(logBuf) + 1[0m; 39 PASSED saLogWriteLogAsync() logBufSize > SA_LOG_MAX_RECORD_SIZE[0m; 40 PASSED saLogWriteLogCallbackT() SA_DISPATCH_ONE[0m; 41 PASSED saLogWriteLogCallbackT() SA_DISPATCH_ALL[0m; 42 PASSED saLogFilterSetCallbackT OK[0m; 43 PASSED saLogStreamClose OK[0m; 44 PASSED saLogStreamOpen_2 with maxFilesRotated = 0, ERR[0m; 45 PASSED saLogStreamOpen_2 with maxFilesRotated = 128, ERR[0m; 46 PASSED saLogStreamOpen_2 with logFileName > 218 characters, ERR[0m; 47 PASSED saLogStreamOpen_2 with invalid filename[0m; 48 PASSED saLogStreamOpen_2 with maxLogRecordSize > MAX_RECSIZE, ERR[0m; 49 PASSED saLogStreamOpen_2 with maxLogRecordSize < 150, ERR[0m; 50 PASSED saLogStreamOpen_2 with stream number out of the limitation, ERR[0m; 51 PASSED saLogInitialize() then saLogFinalize() multiple times. keep MDS connection, OK[0m; 52 PASSED saLogInitialize() then saLogFinalize() multiple times in multiple threads, OK[0m; Suite 3: Limit Fetch API 1 PASSED saLogLimitGet(), Not supported[0m; Suite 4: LOG OI tests, stream objects 1 PASSED CCB Object Modify saLogStreamFileName[0m; 2 PASSED CCB Object Modify saLogStreamPathName, ERR not allowed[0m; 3 PASSED CCB Object
[tickets] [opensaf:tickets] #1990 AMF : Extra notification is received for lock operation on unlocked SG.
- **status**: unassigned --> accepted - **assigned_to**: Praveen - **Part**: - --> d - **Milestone**: 5.1.RC2 --> 4.7.2 - **Comment**: Analysis: Excpet 2N problem exists for all other red models. In this models code when removal response for last SU comes then there there will not be any SU in the oper list. Based on this AMFD again tries to mark the SG locked and this results in extra notification. --- ** [tickets:#1990] AMF : Extra notification is received for lock operation on unlocked SG.** **Status:** accepted **Milestone:** 4.7.2 **Created:** Wed Aug 31, 2016 06:40 AM UTC by Srikanth R **Last Updated:** Tue Sep 13, 2016 10:10 AM UTC **Owner:** Praveen Changeset : 5.1 FC (7997 changeset) Extra notification is received for lock operation on unlocked SG. amf-adm lock safSg=AmfDemo,safApp=AmfDemo === Aug 30 15:22:27 - State Change === eventType = SA_NTF_OBJECT_STATE_CHANGE notificationObject = "safSg=AmfDemo,safApp=AmfDemo" notifyingObject = "safApp=safAmfService" notificationClassId = SA_NTF_VENDOR_ID_SAF.SA_SVC_AMF.103 (0x67) additionalText = "Admin state of safSg=AmfDemo,safApp=AmfDemo changed" sourceIndicator = SA_NTF_MANAGEMENT_OPERATION State ID = SA_AMF_ADMIN_STATE Old State: SA_AMF_ADMIN_UNLOCKED New State: SA_AMF_ADMIN_LOCKED === Aug 30 15:22:27 - State Change === eventType = SA_NTF_OBJECT_STATE_CHANGE notificationObject = "safSg=AmfDemo,safApp=AmfDemo" notifyingObject = "safApp=safAmfService" notificationClassId = SA_NTF_VENDOR_ID_SAF.SA_SVC_AMF.103 (0x67) additionalText = "Admin state of safSg=AmfDemo,safApp=AmfDemo changed" sourceIndicator = SA_NTF_MANAGEMENT_OPERATION State ID = SA_AMF_ADMIN_STATE Old State: SA_AMF_ADMIN_LOCKED New State: SA_AMF_ADMIN_LOCKED --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1969 smf: One step upgrade with cluster reboot does not wait for nodes to start
I think a separate AMF ticket should be written for the AMF part of this problem. However even if the AMF problem is solved I think SMF shall be fixed to handle this in a better way e.g. by having a configurable time out for waiting for nodes. --- ** [tickets:#1969] smf: One step upgrade with cluster reboot does not wait for nodes to start** **Status:** unassigned **Milestone:** 5.0.1 **Created:** Wed Aug 24, 2016 01:01 PM UTC by elunlen **Last Updated:** Fri Sep 09, 2016 12:38 PM UTC **Owner:** nobody When using the one step upgrade feature with a cluster reboot all nodes will restart including the SC-nodes. This is done as the last action in the upgrade step. After the active SC-node is up again SMF will continue with the procedure wrapup. When collecting information in order to prepare the wrapup the node destination for all nodes in the campaign is requested. However this information can only be collected from nodes that are started and has joined the cluster (unlocked). The problem is that SMF does not seems wait in order to give all nodes a chance to join the cluster and if SMF fails to get node destination from any of the nodes the campaign will fail as seen in the log below. When reading node destination there is a 10 sec “try again” loop waiting for “node up” for each node. It is not unlikely that the active SC-node comes up before some of the other nodes and that it will take more than 10 sec after that before some of the other nodes joins the cluster. If that's the case the campaign will fail --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1924 PLM: resurrect PLM test suite
- **Milestone**: 4.7.2 --> 5.1.FC --- ** [tickets:#1924] PLM: resurrect PLM test suite** **Status:** fixed **Milestone:** 5.1.FC **Created:** Wed Jul 20, 2016 08:20 PM UTC by Alex Jones **Last Updated:** Thu Aug 04, 2016 11:23 AM UTC **Owner:** Alex Jones The PLM test suite is currently removed from the build because it doesn't compile. It can't even be run because it needs specific OpenHPI and IMM configuration. This ticket aims to resurrect the PLM test suite. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1994 IMMSv: Finalized CCB are counted under Max Ccb Limit
- **status**: accepted --> review - **Milestone**: 5.1.RC2 --> 5.1.RC1 --- ** [tickets:#1994] IMMSv: Finalized CCB are counted under Max Ccb Limit** **Status:** review **Milestone:** 5.1.RC1 **Created:** Thu Sep 01, 2016 12:32 PM UTC by Chani Srivastava **Last Updated:** Tue Sep 13, 2016 10:09 AM UTC **Owner:** Neelakanta Reddy setup: Version - OpenSAF 5.1.FC : changeset - 7997 4-Node cluster 1PBE with 30K objects - Default maxCcb is configured to 1 as in object opensafImm=opensafImm,safApp=safImmService - Try creating more than 1 Ccb operations ~~~ for (( i = 1 ; i <=2; i++)) immcfg -c TestClass testClass=$i ~~~ Above operation fails with ERR_NO_RESOURCE after the Ccb count for cluster reached 1. Even when a max limit is reached; after few minutes more Ccbs are allowed. See the below syslog snippet Sep 1 14:58:35 OSAF-SC1 osafimmnd[27298]: NO Ccb 45008 COMMITTED (chaniTestClass) Sep 1 14:58:35 OSAF-SC1 osafimmnd[27298]: NO Ccb 45009 COMMITTED (chaniTestClass) Sep 1 14:58:35 OSAF-SC1 osafimmnd[27298]: NO Ccb 45010 COMMITTED (chaniTestClass) Sep 1 14:58:35 OSAF-SC1 osafimmnd[27298]: NO Ccb 45011 COMMITTED (chaniTestClass) Sep 1 14:58:35 OSAF-SC1 osafimmnd[27298]: NO Ccb 45012 COMMITTED (chaniTestClass) **Sep 1 *14:58:35* OSAF-SC1 osafimmnd[27298]: *NO ERR_NO_RESOURCES: maximum Ccbs limit 2 has been reached for the cluster*** Sep 1 15:00:34 OSAF-SC1 syslog-ng[1194]: Log statistics; dropped='pipe(/dev/xconsole)=0', dropped='pipe(/dev/tty10)=0', processed='center(queued)=92951', processed='center(received)=47084', processed='destination(messages)=47077', processed='destination(mailinfo)=7', processed='destination(mailwarn)=0', processed='destination(localmessages)=45786', processed='destination(newserr)=0', processed='destination(mailerr)=0', processed='destination(netmgm)=0', processed='destination(warn)=42', processed='destination(console)=16', processed='destination(null)=0', processed='destination(mail)=7', processed='destination(xconsole)=16', processed='destination(firewall)=0', processed='destination(acpid)=0', processed='destination(newscrit)=0', processed='destination(newsnotice)=0', processed='source(src)=47084' **Sep 1 *15:10:14 *OSAF-SC1 osafimmnd[27298]: *NO Ccb 45014 COMMITTED (chaniTestClass)*** Sep 1 15:10:14 OSAF-SC1 osafimmnd[27298]: NO Ccb 45015 COMMITTED (chaniTestClass) Sep 1 15:10:14 OSAF-SC1 osafimmnd[27298]: NO Ccb 45016 COMMITTED (chaniTestClass) Sep 1 15:10:14 OSAF-SC1 osafimmnd[27298]: NO Ccb 45017 COMMITTED (chaniTestClass) --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2029 imm: fevs message lost during failover
--- ** [tickets:#2029] imm: fevs message lost during failover** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Tue Sep 13, 2016 11:05 AM UTC by Hung Nguyen **Last Updated:** Tue Sep 13, 2016 11:05 AM UTC **Owner:** nobody **Attachments:** - [logs.7z](https://sourceforge.net/p/opensaf/tickets/2029/attachment/logs.7z) (256.4 kB; application/octet-stream) There's fevs message loss when failing over between 2 SCs. ~~~ Sep 8 11:50:00 SC-2-1 osafimmnd[4241]: NO Implementer locally disconnected. Marking it as doomed 232 <754, 2010f> (@OpenSafImmPBE) Sep 8 11:50:00 SC-2-1 osafimmnd[4241]: NO Implementer locally disconnected. Marking it as doomed 233 <755, 2010f> (OsafImmPbeRt_B) ... Sep 8 11:50:00 SC-2-1 osafimmnd[4241]: NO Implementer disconnected 233 <755, 2010f> (OsafImmPbeRt_B) ~~~ The IMMNDs never receive the D2ND_DISCARD_IMPL for @OpenSafImmPBE, so that applier keeps being mark as dying ~~~ Sep 8 11:50:02 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports missing PbeBSlave locally => unsafe Sep 8 11:50:03 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports missing PbeBSlave locally => unsafe Sep 8 11:50:04 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports missing PbeBSlave locally => unsafe ... Sep 8 11:59:08 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports missing PbeBSlave locally => unsafe Sep 8 11:59:09 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports missing PbeBSlave locally => unsafe Sep 8 11:59:10 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports missing PbeBSlave locally => unsafe ... ~~~ Details of the problem is explained here http://sequencediagram.org/index.html?initialData=A4QwTgLglgxloDsIAICSBZdARAjAKFElnhCTUywCYDxo5EUN0A5LAZzzzACMB7AD2S8AbgFMw5bDgA0TKgC45OecgDKAFQCCrAEIBNZAFpJC5JoDC61ADUAogB0EjgGajhbZDF4BXJOORQHjgADAAsAMwAbJxMrB6GAMRgogAmAHwmlCqu7gD6ALaibGwgAOainDwCQmISclk5Hl6+EP6ByCERAOyVfIIi-vXyAFSjbKIIKcj53DDTbKXIzrwSjfOLneFdjtzeKFAoXoUeELwmOMgANiCtMRSURgncl96iGbHs2W5sBQvIbBAQPlgKlkAB3A4ACw6YS2vWqAzqFGUqghEBg0NOZksNls-0BrUcOz2yEhIDYCAA5ChSrwUBBIaJprN1kswLx8pkiQg2GcGUy1s0-BJ2gCoJdLjCItE8B94klUu9kV88scSuV4f1aud5IKfMKAkFYdsnAhuKIYCBvONzvjxZKyRTqchkjBRFAxFN+cy5vkFtJHAd8UDgCdGUtvqy0dDNibHJoYBBvCAJQBPaTIb1rP2LNiQnyXKbm4PA0HRmHhUIADjuUkez1eSpYnwjqr+AJDZahUrhXD6NUGmDiiiH7ACpQQKyZKW8wEusBuoOzfwAFLGAJTc3mZ8NqspM2Nsjm29qXXgA5AAQivN+vd9vfYR2qUI1GrvdYh9rOWq0jOZ7cYIOo4Z6i0bQeCmyQgCkqYAdyADqmjIOgRTqkyQoQPIh4ANQdFeAC8AF4EAA ~~~ Sep 8 11:50:00 SC-2-1 osafimmd[4226]: WA IMMND DOWN on active controller 2 detected at standby immd!! 1. Possible failover ... Sep 8 11:50:00 SC-2-1 osafimmd[4226]: WA Message count:10437 + 1 != 10437 Sep 8 11:50:00 SC-2-1 osafimmnd[4241]: WA DISCARD DUPLICATE FEVS message:10437 Sep 8 11:50:00 SC-2-1 osafimmnd[4241]: WA Error code 2 returned for message type 82 - ignoring ~~~ Attached is the logs --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1431 PLM: support virtualization of EEs
- **status**: assigned --> fixed - **Milestone**: 5.2.FC --> 5.1.FC - **Comment**: Closing this ticket so that it appears in the list of OpenSAF 5.1 enhancements. Please open a new ticket for OpenSAF 5.1.FC if you wish to continue working on this feature. --- ** [tickets:#1431] PLM: support virtualization of EEs** **Status:** fixed **Milestone:** 5.1.FC **Created:** Fri Jul 31, 2015 03:34 PM UTC by Alex Jones **Last Updated:** Mon Aug 29, 2016 12:07 PM UTC **Owner:** Alex Jones This ticket is for adding virtualization support for EEs in PLM. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1988 AMF: Admin operation continuation does not work with short cluster init timeout
- **Milestone**: 5.2.FC --> 5.1.RC2 --- ** [tickets:#1988] AMF: Admin operation continuation does not work with short cluster init timeout** **Status:** assigned **Milestone:** 5.1.RC2 **Created:** Wed Aug 31, 2016 12:04 AM UTC by Minh Hon Chau **Last Updated:** Mon Sep 12, 2016 01:26 AM UTC **Owner:** Minh Hon Chau In scenario of admin continuation after headless, if saAmfClusterStartupTimeout configures short value, then the admin continuation will initiate when saAmfClusterStartupTimeout expires but the SU is still in OUT OF SERVICE. The eventual result is failure of admin operation after headless. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2028 log: write_log_record_hdl get bad file descriptor
- **status**: accepted --> review --- ** [tickets:#2028] log: write_log_record_hdl get bad file descriptor** **Status:** review **Milestone:** 5.0.1 **Created:** Tue Sep 13, 2016 09:54 AM UTC by Vu Minh Nguyen **Last Updated:** Tue Sep 13, 2016 09:54 AM UTC **Owner:** Vu Minh Nguyen In current code, logsv passes the `WRITE REQUEST` to the handle thread even the file descriptor is invalid. Here is some code of log_stream_write_h()@lgs_stream.cc ``` C log_initiate_stream_files(stream); if (*stream->p_fd == -1) { TRACE("%s - Initiating stream files \"%s\" Failed", __FUNCTION__, stream->name.c_str()); } else { TRACE("%s - stream files initiated", __FUNCTION__); } ``` In that case - `p_fd = -1`, `log_stream_write_h` should inform the client TRY_AGAIN by returning the value `(-2)`. Besides, there is an other problem at file closing. Look at the functions `fileclose_hdl` and `fileclose_h`. The file descriptor should be set to `invalid` in `fileclose_hdl`, otherwise `close file` request will re-send to the file handle thread even that file is already closed. Above cases usually happens when the file sytem is busy. Osaflogd TRACE: > 2016-07-02 00:32:48 SC-1 osaflogd[460]: NO fileclose failed Device or > resource busy > 2016-07-02 00:32:50 SC-1 osaflogd[460]: NO fileclose failed Device or > resource busy > 2016-07-02 00:32:52 SC-1 osaflogd[460]: ER write_log_record_hdl - write > FAILED: Bad file descriptor --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1837 TIPC: loading model gives: "osafimmpbed: ER Failed in saImmOmSearchNext_2:5 - exiting" and "osafimmpbed: ER immpbe.cc dumpObjectsToPbe failed - exiting (line:265)
- **Milestone**: 5.1.RC1 --> 5.1.RC2 --- ** [tickets:#1837] TIPC: loading model gives: "osafimmpbed: ER Failed in saImmOmSearchNext_2:5 - exiting" and "osafimmpbed: ER immpbe.cc dumpObjectsToPbe failed - exiting (line:265)** **Status:** unassigned **Milestone:** 5.1.RC2 **Created:** Wed May 18, 2016 05:41 AM UTC by beatriz brandao **Last Updated:** Tue Aug 30, 2016 08:58 AM UTC **Owner:** nobody **Attachments:** - [C:\Docs\lixo\osaftestLog-2016-04-19_04-04-26.gz](https://sourceforge.net/p/opensaf/tickets/1837/attachment/C%3A%5CDocs%5Clixo%5CosaftestLog-2016-04-19_04-04-26.gz) (1.4 MB; application/x-gzip-compressed) Testcase: osaftest.tests.amf.functest.config_changes.test_comptype_attr_chg.Test.test_chg_ct_def_disable_restart Note: this testcase are run with TIPC enabled. Testcase starts @: 2016-04-19 03:44:28 INFO - TestCase:setUp Start | test_chg_ct_def_disable_restart (osaftest.tests.amf.functest. config_changes.test_comptype_attr_chg.Test) Testcase ends @: 2016-04-19 03:45:16 DEBUG: Powered off cluster First analysis done by Zoran: >From syslogs, I cannot see what was the problem for causing ERR_TIMEOUT in >searchNext(). According to MDS logs, it seems that this might be an MDS problem. >From MDS logs: Apr 19 3:44:36.237379 osaflogd[446] NOTIFY |MDTM: svc up event for svc_id = LGA(21), subscri. by svc_id = LGS(20) pwe_id=1 Adest = Apr 19 3:44:36.238518 osafntfd[461] NOTIFY |MDTM: svc up event for svc_id = NTFA(29), subscri. by svc_id = NTFS(28) pwe_id=1 Adest = Apr 19 3:44:36.239261 osafclmd[477] NOTIFY |MDTM: svc up event for svc_id = CLMA(35), subscri. by svc_id = CLMS(34) pwe_id=1 Adest = Apr 19 3:44:38.788267 osaflogd[446] NOTIFY |MDTM: svc up event for svc_id = LGA(21), subscri. by svc_id = LGS(20) pwe_id=1 Adest = Apr 19 3:44:44.911298 osafimmpbed[453] ERR |MDS_SND_RCV: Timeout or Error occured Apr 19 3:44:44.912049 osafimmpbed[453] ERR |MDS_SND_RCV: Timeout occured on sndrsp message Apr 19 3:44:44.912128 osafimmpbed[453] ERR |MDS_SND_RCV: Adest=<0x0002010f,1637493776> Apr 19 3:44:44.919827 osafimmnd[432] NOTIFY |MDTM: svc down event for svc_id = IMMA_OM(26), subscri. by svc_id = IMMND(25) pwe_id=1 Adest = Apr 19 3:44:45.413550 osafimmpbed[679] NOTIFY |BEGIN MDS LOGGING| PID= | ARCHW=a|64bit=1 the was no any MDS message between 3:44:38.788267 and 3:44:44.911298. At 3:44:44.911298, MDS send/receive PBE request was timed out. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1895 ntf: ER in syslog: ER NtfAdmin::subscriptionRemoved client 12 not found
- **Milestone**: 5.1.RC1 --> 5.1.RC2 --- ** [tickets:#1895] ntf: ER in syslog: ER NtfAdmin::subscriptionRemoved client 12 not found** **Status:** review **Milestone:** 5.1.RC2 **Created:** Fri Jun 24, 2016 03:24 AM UTC by Vo Minh Hoang **Last Updated:** Thu Sep 01, 2016 11:33 AM UTC **Owner:** Canh Truong **Attachments:** - [osafntfd.txt](https://sourceforge.net/p/opensaf/tickets/1895/attachment/osafntfd.txt) (705.5 kB; text/plain) Failed when run test suit 2: osafntfd [463:ntfs_evt.c:0338] >> proc_unsubscribe_msg: client_id 28, subscriptionId 111 osafntfd [463:NtfAdmin.cc:0553] ER NtfAdmin::subscriptionRemoved client 28 not found osafntfd [463:ntfs_evt.c:0341] << proc_unsubscribe_msg Currently, when finalizing the last client, ntfa uninstall MDS connection. This causes that the NCSMDS_DOWN event will be sent to ntfs. ntfs will remove all clients that relates to this MDS. But if we initializes new client immediately after finalizing, ntfs may reviece the message of initialization before message of NCSMDS_DOWN event. This cause new client will be removed without finalizing and then action subcribe failed. Similiar ticket: https://sourceforge.net/p/opensaf/tickets/1818/ --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1929 osaf: Build fails with GCC 6.1.0
- **Milestone**: 5.1.RC1 --> 5.1.RC2 --- ** [tickets:#1929] osaf: Build fails with GCC 6.1.0** **Status:** assigned **Milestone:** 5.1.RC2 **Created:** Tue Aug 02, 2016 09:21 AM UTC by A V Mahesh (AVM) **Last Updated:** Tue Aug 30, 2016 08:57 AM UTC **Owner:** A V Mahesh (AVM) OpenSAF fails to build with GCC 6.1.0, due to new compiler warnings: # gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-pc-linux-gnu/6.1.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: ../gcc-6.1.0/configure --prefix=/usr --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --enable-languages=c,c++ --disable-multilib --disable-bootstrap --with-system-zlib --with-gmp=/usr/local/gmp-6.1.1 --with-mpfr=/usr/local/mpfr-3.1.4 --with-mpc=/usr/local/mpc-1.0.3 Thread model: posix gcc version 6.1.0 (GCC) make[5]: Entering directory `/avm/opensaf/osaf/tools/safimm/immdump' g++ -DHAVE_CONFIG_H -I. -I../../../.. -DSA_EXTENDED_NAME_SOURCE -I../../../../osaf/libs/saf/include -I../../../../osaf/libs/core/include -I../../../../osaf/libs/core/leap/include -I../../../../osaf/libs/core/mds/include -I../../../../osaf/libs/core/common/include -I../../../../osaf/libs/common/immsv/include -Wall -fno-strict-aliasing -Werror -fPIC -D__STDC_FORMAT_MACROS -D_FORTIFY_SOURCE=2 -fstack-protector -DINTERNAL_VERSION_ID='""' -I/usr/include/libxml2 -g -O2 -MT immdump-imm_dumper.o -MD -MP -MF .deps/immdump-imm_dumper.Tpo -c -o immdump-imm_dumper.o `test -f 'imm_dumper.cc' || echo './'`imm_dumper.cc imm_dumper.cc: In function ‘int main(int, char**)’: imm_dumper.cc:144:5: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation] if ((c = getopt_long(argc, argv, "hp:x:c:", long_options, NULL)) == -1) ^~ imm_dumper.cc:147:13: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’ switch (c) { ^~ cc1plus: all warnings being treated as errors make[5]: *** [immdump-imm_dumper.o] Error 1 make[5]: Leaving directory `/avm/opensaf/osaf/tools/safimm/immdump' make[4]: *** [all-recursive] Error 1 make[4]: Leaving directory `/avm/opensaf/osaf/tools/safimm' --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1968 SMF does not handle AMF long DN&RDN support
- **Milestone**: 5.1.RC1 --> 5.1.RC2 --- ** [tickets:#1968] SMF does not handle AMF long DN&RDN support** **Status:** review **Milestone:** 5.1.RC2 **Created:** Wed Aug 24, 2016 12:51 PM UTC by elunlen **Last Updated:** Mon Sep 12, 2016 01:11 PM UTC **Owner:** elunlen SMF already supports long DN. However there are some checks regarding AMF related objects that does not allow some DN to be longer than 255 (RDN 64). These tests shall be removed since AMF will support long DN from 5.1 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1983 plm: Build failure with gcc 6.1.1 on 32-bit system
- **Milestone**: 5.1.RC1 --> 5.1.RC2 --- ** [tickets:#1983] plm: Build failure with gcc 6.1.1 on 32-bit system** **Status:** unassigned **Milestone:** 5.1.RC2 **Created:** Mon Aug 29, 2016 08:44 PM UTC by Anders Widell **Last Updated:** Tue Aug 30, 2016 08:56 AM UTC **Owner:** nobody PLM fails to bulild with gcc 6.1.1 on a 32-bit system: ~~~ make[3]: Entering directory '/home/opensaf/opensaf-staging/tests/plmsv/plms' CC plmtest-test_saPlmEntityGroupAdd.o test_saPlmEntityGroupAdd.c: In function 'saPlmEntityGroupAdd_05': test_saPlmEntityGroupAdd.c:57:28: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast] rc=saPlmEntityGroupAdd((SaPlmEntityGroupHandleT)&entityGroupHandle, &f120_slot_1_dn , entityNamesNumber,SA_PLM_GROUP_SINGLE_ENTITY); ^ cc1: all warnings being treated as errors Makefile:638: recipe for target 'plmtest-test_saPlmEntityGroupAdd.o' failed ~~~ --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1986 log: logtest fails when run after immomtest
- **Milestone**: 5.1.RC1 --> 5.1.RC2 --- ** [tickets:#1986] log: logtest fails when run after immomtest** **Status:** accepted **Milestone:** 5.1.RC2 **Created:** Tue Aug 30, 2016 09:03 AM UTC by Anders Widell **Last Updated:** Tue Aug 30, 2016 09:06 AM UTC **Owner:** Vu Minh Nguyen If I first run immomtest and then logtest, I get the following result: ~~~ Suite 1: Library Life Cycle 1 PASSED saLogInitialize() OK[0m; 2 PASSED saLogInitialize() with NULL pointer to handle[0m; 3 PASSED saLogInitialize() with NULL pointer to callbacks[0m; 4 PASSED saLogInitialize() with NULL callbacks AND version[0m; 5 PASSED saLogInitialize() with uninitialized handle[0m; 6 PASSED saLogInitialize() with uninitialized version[0m; 7 PASSED saLogInitialize() with too high release level[0m; 8 PASSED saLogInitialize() with minor version set to 1[0m; 9 PASSED saLogInitialize() with major version set to 3[0m; 10 PASSED saLogSelectionObjectGet() OK[0m; 11 PASSED saLogSelectionObjectGet() with NULL log handle[0m; 12 PASSED saLogDispatch() OK[0m; 13 PASSED saLogFinalize() OK[0m; 14 PASSED saLogFinalize() with NULL log handle[0m; Suite 2: Log Service Operations 1 PASSED saLogStreamOpen_2() system stream OK[0m; 2 PASSED saLogStreamOpen_2() notification stream OK[0m; 3 PASSED saLogStreamOpen_2() alarm stream OK[0m; 4 PASSED Create app stream OK[0m; 5 PASSED Create and open app stream[0m; 6 PASSED saLogStreamOpen_2() - NULL ptr to handle[0m; 7 PASSED saLogStreamOpen_2() - NULL logStreamName[0m; 8 PASSED Open app stream second time with altered logFileName[0m; 9 PASSED Open app stream second time with altered logFilePathName[0m; 10 PASSED Open app stream second time with altered logFileFmt[0m; 11 PASSED Open app stream second time with altered maxLogFileSize[0m; 12 PASSED Open app stream second time with altered maxLogRecordSize[0m; 13 PASSED Open app stream second time with altered maxFilesRotated[0m; 14 PASSED Open app stream second time with altered haProperty[0m; 15 PASSED Open app with logFileFmt == NULL[0m; 16 PASSED Open app stream second time with logFileFmt == NULL[0m; 17 PASSED Open app stream with NULL logFilePathName[0m; 18 PASSED Open app stream with '.' logFilePathName[0m; 19 PASSED Open app stream with invalid logFileFmt[0m; 20 PASSED Open app stream with unsupported logFullAction[0m; 21 PASSED Open non exist app stream with NULL create attrs[0m; 22 PASSED saLogStreamOpenAsync_2(), Not supported[0m; 23 PASSED saLogStreamOpenCallbackT() OK[0m; 24 PASSED saLogWriteLog(), Not supported[0m; 25 PASSED saLogWriteAsyncLog() system OK[0m; 26 PASSED saLogWriteAsyncLog() alarm OK[0m; 27 PASSED saLogWriteAsyncLog() notification OK[0m; 28 PASSED saLogWriteAsyncLog() with NULL logStreamHandle[0m; 29 PASSED saLogWriteAsyncLog() with invalid logStreamHandle[0m; 30 PASSED saLogWriteAsyncLog() with invalid ackFlags[0m; 31 PASSED saLogWriteAsyncLog() with NULL logRecord ptr[0m; 32 PASSED saLogWriteAsyncLog() logSvcUsrName == NULL[0m; 33 PASSED saLogWriteAsyncLog() logSvcUsrName == NULL and envset[0m; 34 PASSED saLogWriteAsyncLog() with logTimeStamp set[0m; 35 PASSED saLogWriteAsyncLog() without logTimeStamp set[0m; 36 PASSED saLogWriteAsyncLog() 1800 bytes logrecord (ticket #203)[0m; 37 PASSED saLogWriteAsyncLog() invalid severity[0m; 38 PASSED saLogWriteLogAsync() logBufSize > strlen(logBuf) + 1[0m; 39 PASSED saLogWriteLogAsync() logBufSize > SA_LOG_MAX_RECORD_SIZE[0m; 40 PASSED saLogWriteLogCallbackT() SA_DISPATCH_ONE[0m; 41 PASSED saLogWriteLogCallbackT() SA_DISPATCH_ALL[0m; 42 PASSED saLogFilterSetCallbackT OK[0m; 43 PASSED saLogStreamClose OK[0m; 44 PASSED saLogStreamOpen_2 with maxFilesRotated = 0, ERR[0m; 45 PASSED saLogStreamOpen_2 with maxFilesRotated = 128, ERR[0m; 46 PASSED saLogStreamOpen_2 with logFileName > 218 characters, ERR[0m; 47 PASSED saLogStreamOpen_2 with invalid filename[0m; 48 PASSED saLogStreamOpen_2 with maxLogRecordSize > MAX_RECSIZE, ERR[0m; 49 PASSED saLogStreamOpen_2 with maxLogRecordSize < 150, ERR[0m; 50 PASSED saLogStreamOpen_2 with stream number out of the limitation, ERR[0m; 51 PASSED saLogInitialize() then saLogFinalize() multiple times. keep MDS connection, OK[0m; 52 PASSED saLogInitialize() then saLogFinalize() multiple times in multiple threads, OK[0m; Suite 3: Limit Fetch API 1 PASSED saLogLimitGet(), Not supported[0m; Suite 4: LOG OI tests, stream objects 1 PASSED CCB Object Modify saLogStreamFileName[0m; 2 PASSED CCB Object Modify saLogStreamPathName, ERR not allowed[0m; 3 PASSED CCB O
[tickets] [opensaf:tickets] #1990 AMF : Extra notification is received for lock operation on unlocked SG.
- **Milestone**: 5.1.RC1 --> 5.1.RC2 --- ** [tickets:#1990] AMF : Extra notification is received for lock operation on unlocked SG.** **Status:** unassigned **Milestone:** 5.1.RC2 **Created:** Wed Aug 31, 2016 06:40 AM UTC by Srikanth R **Last Updated:** Wed Aug 31, 2016 06:40 AM UTC **Owner:** nobody Changeset : 5.1 FC (7997 changeset) Extra notification is received for lock operation on unlocked SG. amf-adm lock safSg=AmfDemo,safApp=AmfDemo === Aug 30 15:22:27 - State Change === eventType = SA_NTF_OBJECT_STATE_CHANGE notificationObject = "safSg=AmfDemo,safApp=AmfDemo" notifyingObject = "safApp=safAmfService" notificationClassId = SA_NTF_VENDOR_ID_SAF.SA_SVC_AMF.103 (0x67) additionalText = "Admin state of safSg=AmfDemo,safApp=AmfDemo changed" sourceIndicator = SA_NTF_MANAGEMENT_OPERATION State ID = SA_AMF_ADMIN_STATE Old State: SA_AMF_ADMIN_UNLOCKED New State: SA_AMF_ADMIN_LOCKED === Aug 30 15:22:27 - State Change === eventType = SA_NTF_OBJECT_STATE_CHANGE notificationObject = "safSg=AmfDemo,safApp=AmfDemo" notifyingObject = "safApp=safAmfService" notificationClassId = SA_NTF_VENDOR_ID_SAF.SA_SVC_AMF.103 (0x67) additionalText = "Admin state of safSg=AmfDemo,safApp=AmfDemo changed" sourceIndicator = SA_NTF_MANAGEMENT_OPERATION State ID = SA_AMF_ADMIN_STATE Old State: SA_AMF_ADMIN_LOCKED New State: SA_AMF_ADMIN_LOCKED --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1991 AMF: Existing PG tracking should not be stopped for CURRENT flag
- **Milestone**: 5.1.RC1 --> 5.1.RC2 --- ** [tickets:#1991] AMF: Existing PG tracking should not be stopped for CURRENT flag** **Status:** unassigned **Milestone:** 5.1.RC2 **Created:** Wed Aug 31, 2016 09:44 AM UTC by Srikanth R **Last Updated:** Wed Aug 31, 2016 09:44 AM UTC **Owner:** nobody 5.1.FC : changeset - 6997 Issue : Existing PG tracking should not be stopped for CURRENT call Steps performed : -> Call saAmfInitialize_4() -> Call saAmfProtectionGroupTrack_4() with SA_TRACK_CURRENT flag. -> Call saAmfProtectionGroupTrack_4() with SA_TRACK_CHANGES flag. -> Call saAmfProtectionGroupTrack_4() with SA_TRACK_CURRENT flag. -> Call saAmfProtectionGroupTrackStop() Observed output : TrackStop returns ERR_NOT_EXIST, indicating that tracking is not started earlier. Expected output: TrackStop() api should return SA_AIS_OK and in the earlier release, api is returning SA_AIS_OK. According to the B04.01 spec 7.11.1 page 318 , Tracking should not be stopped untill TrackStop() is called explicitly. Once saAmfProtectionGroupTrack_4() has been called with trackFlags containing either SA_TRACK_CHANGES or SA_TRACK_CHANGES_ONLY, notification callbacks can only be stopped by an invocation of saAmfProtectionGroupTrackStop(). --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1993 amf: amfnd crashes during su lock if CSI attribute name or value is a long dn.
- **Milestone**: 5.1.RC1 --> 5.1.RC2 --- ** [tickets:#1993] amf: amfnd crashes during su lock if CSI attribute name or value is a long dn.** **Status:** review **Milestone:** 5.1.RC2 **Created:** Thu Sep 01, 2016 11:09 AM UTC by Praveen **Last Updated:** Fri Sep 09, 2016 12:24 PM UTC **Owner:** Praveen **Attachments:** - [amfnd_crash.tgz](https://sourceforge.net/p/opensaf/tickets/1993/attachment/amfnd_crash.tgz) (69.4 kB; application/x-compressed) Configuration: In the long dn amf demo, add csi attribute for the CSI keeping attribute value a longdn. 1)Bring the configuration up. 2)Lock the SU. 3)AMFND crashes. AMFND uses memcpy() and thus works with orignal csi attribute values from csi_rec. It frees the memory in avsv_amf_cbk_free() when CSI_SET callback arrives. During SU lock, it agian tries to free the memory while deleting the record. At AMFND and AMFD, all SaNameT handling should be done using osaf_extended_name_alloc() API. Issue will be applicable in case of messages related to CSI Attribute change callback also. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1997 IMM: immnd fails to update si while bringing up opensaf with 2PBE
- **Milestone**: 5.1.RC1 --> 5.1.RC2 --- ** [tickets:#1997] IMM: immnd fails to update si while bringing up opensaf with 2PBE** **Status:** assigned **Milestone:** 5.1.RC2 **Created:** Fri Sep 02, 2016 11:46 AM UTC by Chani Srivastava **Last Updated:** Tue Sep 13, 2016 01:37 AM UTC **Owner:** Neelakanta Reddy **Attachments:** - [LogAMF.zip](https://sourceforge.net/p/opensaf/tickets/1997/attachment/LogAMF.zip) (432.4 kB; application/zip) setup: Version - OpenSAF 5.1.FC : changeset - 7997 4-Node cluster 2PBE enabled Bring up opensaf on a controller with 2PBE enable. IMMND throwing error Attachments: syslog, amfd and immnd traces Sep 2 16:54:13 SLOT1 osafimmpbed: WA Start prepare for ccb: 10004/4294967300 towards slave PBE returned: '12' from Immsv Sep 2 16:54:13 SLOT1 osafimmpbed: WA PBE-A failed to prepare PRTA update Ccb:10004/4294967300 towards PBE-B Sep 2 16:54:13 SLOT1 osafimmpbed: NO 2PBE Error (18) in PRTA update (ccbId:10004) **Sep 2 16:54:13 SLOT1 osafimmnd[3632]: WA update of PERSISTENT runtime attributes in object 'safSi=NoRed3,safApp=OpenSAF' REVERTED. PBE rc:18 Sep 2 16:54:13 SLOT1 osafamfd[3698]: ER exec: update FAILED 18** Sep 2 16:54:14 SLOT1 osafimmnd[3632]: NO PBE-OI established on this SC. Dumping incrementally to file imm.db Note- 1. OpenSAF is successfully started 2. Issue not seen with 1PBE Once controller is up, amf-state si gives safSi=SC-2N,safApp=OpenSAF saAmfSIAdminState=UNLOCKED(1) saAmfSIAssignmentState=PARTIALLY_ASSIGNED(3) safSi=NoRed4,safApp=OpenSAF saAmfSIAdminState=UNLOCKED(1) saAmfSIAssignmentState=UNASSIGNED(1) safSi=NoRed1,safApp=OpenSAF saAmfSIAdminState=UNLOCKED(1) saAmfSIAssignmentState=FULLY_ASSIGNED(2) safSi=NoRed2,safApp=OpenSAF saAmfSIAdminState=UNLOCKED(1) saAmfSIAssignmentState=UNASSIGNED(1) safSi=NoRed3,safApp=OpenSAF saAmfSIAdminState=UNLOCKED(1) saAmfSIAssignmentState=UNASSIGNED(1) --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1994 IMMSv: Finalized CCB are counted under Max Ccb Limit
- **Milestone**: 5.1.RC1 --> 5.1.RC2 --- ** [tickets:#1994] IMMSv: Finalized CCB are counted under Max Ccb Limit** **Status:** accepted **Milestone:** 5.1.RC2 **Created:** Thu Sep 01, 2016 12:32 PM UTC by Chani Srivastava **Last Updated:** Thu Sep 08, 2016 06:55 AM UTC **Owner:** Neelakanta Reddy setup: Version - OpenSAF 5.1.FC : changeset - 7997 4-Node cluster 1PBE with 30K objects - Default maxCcb is configured to 1 as in object opensafImm=opensafImm,safApp=safImmService - Try creating more than 1 Ccb operations ~~~ for (( i = 1 ; i <=2; i++)) immcfg -c TestClass testClass=$i ~~~ Above operation fails with ERR_NO_RESOURCE after the Ccb count for cluster reached 1. Even when a max limit is reached; after few minutes more Ccbs are allowed. See the below syslog snippet Sep 1 14:58:35 OSAF-SC1 osafimmnd[27298]: NO Ccb 45008 COMMITTED (chaniTestClass) Sep 1 14:58:35 OSAF-SC1 osafimmnd[27298]: NO Ccb 45009 COMMITTED (chaniTestClass) Sep 1 14:58:35 OSAF-SC1 osafimmnd[27298]: NO Ccb 45010 COMMITTED (chaniTestClass) Sep 1 14:58:35 OSAF-SC1 osafimmnd[27298]: NO Ccb 45011 COMMITTED (chaniTestClass) Sep 1 14:58:35 OSAF-SC1 osafimmnd[27298]: NO Ccb 45012 COMMITTED (chaniTestClass) **Sep 1 *14:58:35* OSAF-SC1 osafimmnd[27298]: *NO ERR_NO_RESOURCES: maximum Ccbs limit 2 has been reached for the cluster*** Sep 1 15:00:34 OSAF-SC1 syslog-ng[1194]: Log statistics; dropped='pipe(/dev/xconsole)=0', dropped='pipe(/dev/tty10)=0', processed='center(queued)=92951', processed='center(received)=47084', processed='destination(messages)=47077', processed='destination(mailinfo)=7', processed='destination(mailwarn)=0', processed='destination(localmessages)=45786', processed='destination(newserr)=0', processed='destination(mailerr)=0', processed='destination(netmgm)=0', processed='destination(warn)=42', processed='destination(console)=16', processed='destination(null)=0', processed='destination(mail)=7', processed='destination(xconsole)=16', processed='destination(firewall)=0', processed='destination(acpid)=0', processed='destination(newscrit)=0', processed='destination(newsnotice)=0', processed='source(src)=47084' **Sep 1 *15:10:14 *OSAF-SC1 osafimmnd[27298]: *NO Ccb 45014 COMMITTED (chaniTestClass)*** Sep 1 15:10:14 OSAF-SC1 osafimmnd[27298]: NO Ccb 45015 COMMITTED (chaniTestClass) Sep 1 15:10:14 OSAF-SC1 osafimmnd[27298]: NO Ccb 45016 COMMITTED (chaniTestClass) Sep 1 15:10:14 OSAF-SC1 osafimmnd[27298]: NO Ccb 45017 COMMITTED (chaniTestClass) --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2000 msg: Cluster reset happend due to msgd crashed on both the controller
- **Milestone**: 5.1.RC1 --> 5.1.RC2 --- ** [tickets:#2000] msg: Cluster reset happend due to msgd crashed on both the controller** **Status:** fixed **Milestone:** 5.1.RC2 **Created:** Tue Sep 06, 2016 06:04 AM UTC by Ritu Raj **Last Updated:** Tue Sep 13, 2016 10:07 AM UTC **Owner:** A V Mahesh (AVM) **Attachments:** - [Active_syslog](https://sourceforge.net/p/opensaf/tickets/2000/attachment/Active_syslog) (716.7 kB; application/octet-stream) - [Standby_syslog](https://sourceforge.net/p/opensaf/tickets/2000/attachment/Standby_syslog) (696.4 kB; application/octet-stream) Environment details -- OS : Suse 64bit Changeset : 7997 ( 5.1.FC) Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 1PBE enabled with 30K objects ) Summary : -- Cluster reset happend due to assertion SA_MAX_UNEXTENDED_NAME_LENGTH failed in msgd Steps followed & Observed behaviour -- 1. Invoked failover 2. After, few successful failover, New Active Controller rebooted beacuse of Assertion 'length < SA_MAX_UNEXTENDED_NAME_LENGTH' failed in msgd. While previous Active joinig the cluster as a Standby Role resulted cluster reset happend. [Timeline: Sep 6 00:13:02 sofo-s2] Sep 6 00:13:02 sofo-s2 osafimmd[3985]: NO MDS event from svc_id 24 (change:5, dest:13) Sep 6 00:13:02 sofo-s2 osafmsgd[4145]: osaf_extended_name.c:139: osaf_extended_name_length: Assertion 'length < SA_MAX_UNEXTENDED_NAME_LENGTH' failed. Sep 6 00:13:02 sofo-s2 osafamfnd[4046]: NO 'safComp=MQD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' Sep 6 00:13:02 sofo-s2 osafamfnd[4046]: ER safComp=MQD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast Sep 6 00:13:02 sofo-s2 osafamfnd[4046]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131599, SupervisionTime = 60 Sep 6 00:13:02 sofo-s2 opensaf_reboot: Rebooting local node; timeout=60 Notes: 1. Syslog attached 2 msgnd & msgd trace not enabled --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2000 msg: Cluster reset happend due to msgd crashed on both the controller
- **status**: review --> fixed - **Comment**: changeset: 8064:99410ba8cc21 parent: 8061:da089e8f337c user:Ramesh date:Tue Sep 13 15:01:43 2016 +0530 summary: msg: memset ilist_info and track_info to avoid garbage [#2000] changeset: 8065:019e617955ef branch: opensaf-5.1.x tag: tip parent: 8063:59a5226122ed user:Ramesh date:Tue Sep 13 15:02:23 2016 +0530 summary: msg: memset ilist_info and track_info to avoid garbage [#2000] --- ** [tickets:#2000] msg: Cluster reset happend due to msgd crashed on both the controller** **Status:** fixed **Milestone:** 5.1.RC1 **Created:** Tue Sep 06, 2016 06:04 AM UTC by Ritu Raj **Last Updated:** Tue Sep 13, 2016 06:04 AM UTC **Owner:** A V Mahesh (AVM) **Attachments:** - [Active_syslog](https://sourceforge.net/p/opensaf/tickets/2000/attachment/Active_syslog) (716.7 kB; application/octet-stream) - [Standby_syslog](https://sourceforge.net/p/opensaf/tickets/2000/attachment/Standby_syslog) (696.4 kB; application/octet-stream) Environment details -- OS : Suse 64bit Changeset : 7997 ( 5.1.FC) Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 1PBE enabled with 30K objects ) Summary : -- Cluster reset happend due to assertion SA_MAX_UNEXTENDED_NAME_LENGTH failed in msgd Steps followed & Observed behaviour -- 1. Invoked failover 2. After, few successful failover, New Active Controller rebooted beacuse of Assertion 'length < SA_MAX_UNEXTENDED_NAME_LENGTH' failed in msgd. While previous Active joinig the cluster as a Standby Role resulted cluster reset happend. [Timeline: Sep 6 00:13:02 sofo-s2] Sep 6 00:13:02 sofo-s2 osafimmd[3985]: NO MDS event from svc_id 24 (change:5, dest:13) Sep 6 00:13:02 sofo-s2 osafmsgd[4145]: osaf_extended_name.c:139: osaf_extended_name_length: Assertion 'length < SA_MAX_UNEXTENDED_NAME_LENGTH' failed. Sep 6 00:13:02 sofo-s2 osafamfnd[4046]: NO 'safComp=MQD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' Sep 6 00:13:02 sofo-s2 osafamfnd[4046]: ER safComp=MQD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast Sep 6 00:13:02 sofo-s2 osafamfnd[4046]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131599, SupervisionTime = 60 Sep 6 00:13:02 sofo-s2 opensaf_reboot: Rebooting local node; timeout=60 Notes: 1. Syslog attached 2 msgnd & msgd trace not enabled --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2001 IMM: Owner handle is getting corrupt when OmAdminOperationInvoke retruns ERR_TIMEOUT
- **Milestone**: 5.1.RC1 --> 5.1.RC2 --- ** [tickets:#2001] IMM: Owner handle is getting corrupt when OmAdminOperationInvoke retruns ERR_TIMEOUT** **Status:** assigned **Milestone:** 5.1.RC2 **Created:** Tue Sep 06, 2016 07:14 AM UTC by Chani Srivastava **Last Updated:** Tue Sep 13, 2016 01:38 AM UTC **Owner:** Neelakanta Reddy **Attachments:** - [AdminCbkTmOut.zip](https://sourceforge.net/p/opensaf/tickets/2001/attachment/AdminCbkTmOut.zip) (95.1 kB; application/zip) OS : Suse 64bit Changeset : 7997 ( 5.1.FC) Setup : 4 nodes 1 PBE enabled Summary: Steps to Reproduce 1. Invoke saImmOmAdminOperationInvokeAsync_2() while waiting in callback with time more that OI_CALLBACK_TIMEOUT value 2. Invoke saImmOmAdminOperationInvokeAsync_2() again and do not wait OR Invoke any Ccb operation Observed Bahavior: Step1 will return SA_AIS_ERR_TIMEOUT (Expected) Step2 is returning SA_AIS_ERR_BAD_HANDLE (SA_AIS_OK is expected) Sep 6 12:22:27 SLOT1 python2.5: logtrace: trace enabled to file /tmp/imma_oi_callbacktimeout.trace, mask=0x Sep 6 12:22:27 SLOT1 python2.5: NO IMMA library TRACE initialize done pid:1147 svid:26 file:/tmp/imma_oi_callbacktimeout.trace Sep 6 12:22:27 SLOT1 osafimmnd[838]: NO Implementer connected: 14 (testOiTmout_verifyAdminOpCallback_37) <343, 2010f> Sep 6 12:22:42 SLOT1 osafimmnd[838]: WA IMMND - Client went down so no response Sep 6 12:22:42 SLOT1 osafimmnd[838]: WA MDS Send Failed to service:IMMND rc:2 Sep 6 12:22:42 SLOT1 osafimmnd[838]: ER Problem in sending to peer IMMND over MDS. Discarding admin op reply. Sep 6 12:22:42 SLOT1 osafimmnd[838]: WA Error code 2 returned for message type 21 - ignoring Sep 6 12:22:47 SLOT1 osafimmnd[838]: WA IMMND - Client 1468878946575 went down on syncronous request, discarding request Sep 6 12:22:47 SLOT1 osafimmnd[838]: NO Implementer locally disconnected. Marking it as doomed 14 <343, 2010f> (testOiTmout_verifyAdminOpCallback_37) Sep 6 12:22:47 SLOT1 osafimmnd[838]: NO Implementer disconnected 14 <343, 2010f> (testOiTmout_verifyAdminOpCallback_37) Note: **Test passed in OpenSAF release 5.0** Agent traces and immnd, immd traces attached --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2013 IMM: Search Handle getting corrupt when saImmOmSearchNext_2() returns ERR_TIMEOUT
- **Milestone**: 5.1.RC1 --> 5.1.RC2 --- ** [tickets:#2013] IMM: Search Handle getting corrupt when saImmOmSearchNext_2() returns ERR_TIMEOUT** **Status:** assigned **Milestone:** 5.1.RC2 **Created:** Thu Sep 08, 2016 12:10 PM UTC by Chani Srivastava **Last Updated:** Tue Sep 13, 2016 01:38 AM UTC **Owner:** Neelakanta Reddy **Attachments:** - [SearchTmOut.zip](https://sourceforge.net/p/opensaf/tickets/2013/attachment/SearchTmOut.zip) (883.9 kB; application/zip) OS : Suse 64bit Changeset : 7997 ( 5.1.FC) Setup : 4 nodes Summary: Steps to Reproduce 1. Create a runtime/config object 2. Do Search Initiliaze() 3. Delete the object created in Step1 4. Do SearchNext() 5. Do SearchNext() again Observed Bahavior: Step4 will return SA_AIS_ERR_TIMEOUT (Expected) Step5 is returning SA_AIS_ERR_BAD_HANDLE** (SA_AIS_ERR_NOT_EXIST is expected)** **Note: Test passed in OpenSAF release 5.0** Agent traces and immnd, immd traces attached --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2017 Update the SMF PR document with information about faster upgrade
- **Milestone**: 5.1.RC1 --> 5.1.RC2 --- ** [tickets:#2017] Update the SMF PR document with information about faster upgrade** **Status:** accepted **Milestone:** 5.1.RC2 **Created:** Fri Sep 09, 2016 12:22 PM UTC by elunlen **Last Updated:** Fri Sep 09, 2016 01:21 PM UTC **Owner:** elunlen Update the SMF PR document with information about: * Balanced In Service Upgrade (BISU) [#1685] * Parallel swBundle removal and installation [#1633] * NG lock and unlock in single step upgrade [#1634] --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2024 Imm doc: updattion of IMMsv PR document for 5.1
- **Milestone**: 5.1.RC1 --> 5.1.RC2 --- ** [tickets:#2024] Imm doc: updattion of IMMsv PR document for 5.1** **Status:** accepted **Milestone:** 5.1.RC2 **Created:** Mon Sep 12, 2016 06:17 AM UTC by Neelakanta Reddy **Last Updated:** Mon Sep 12, 2016 06:17 AM UTC **Owner:** Neelakanta Reddy This defect is to update, IMMsv PR document --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2028 log: write_log_record_hdl get bad file descriptor
--- ** [tickets:#2028] log: write_log_record_hdl get bad file descriptor** **Status:** accepted **Milestone:** 5.0.1 **Created:** Tue Sep 13, 2016 09:54 AM UTC by Vu Minh Nguyen **Last Updated:** Tue Sep 13, 2016 09:54 AM UTC **Owner:** Vu Minh Nguyen In current code, logsv passes the `WRITE REQUEST` to the handle thread even the file descriptor is invalid. Here is some code of log_stream_write_h()@lgs_stream.cc ``` C log_initiate_stream_files(stream); if (*stream->p_fd == -1) { TRACE("%s - Initiating stream files \"%s\" Failed", __FUNCTION__, stream->name.c_str()); } else { TRACE("%s - stream files initiated", __FUNCTION__); } ``` In that case - `p_fd = -1`, `log_stream_write_h` should inform the client TRY_AGAIN by returning the value `(-2)`. Besides, there is an other problem at file closing. Look at the functions `fileclose_hdl` and `fileclose_h`. The file descriptor should be set to `invalid` in `fileclose_hdl`, otherwise `close file` request will re-send to the file handle thread even that file is already closed. Above cases usually happens when the file sytem is busy. Osaflogd TRACE: > 2016-07-02 00:32:48 SC-1 osaflogd[460]: NO fileclose failed Device or > resource busy > 2016-07-02 00:32:50 SC-1 osaflogd[460]: NO fileclose failed Device or > resource busy > 2016-07-02 00:32:52 SC-1 osaflogd[460]: ER write_log_record_hdl - write > FAILED: Bad file descriptor --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2019 amf: Unit tests fail to build
- **status**: review --> fixed - **assigned_to**: Long HB Nguyen --> nobody - **Comment**: changeset: 8061:da089e8f337c tag: tip parent: 8059:9eb1e54daa76 user:Long Nguyen date:Tue Sep 13 19:12:26 2016 +1000 summary: amf: Unit tests fail to build [#2019] changeset: 8060:901da236b68c branch: opensaf-5.1.x parent: 8058:a2d3ea8d848f user:Long Nguyen date:Tue Sep 13 19:10:22 2016 +1000 summary: amf: Unit tests fail to build [#2019] --- ** [tickets:#2019] amf: Unit tests fail to build** **Status:** fixed **Milestone:** 5.1.RC1 **Created:** Fri Sep 09, 2016 01:08 PM UTC by Anders Widell **Last Updated:** Tue Sep 13, 2016 04:33 AM UTC **Owner:** nobody "make check" fails (32-bit system, GCC version 6.1.1, googletest version 48ee8e98abc950abd8541e15550b18f8f6cfb3a9): ~~~ make[8]: Entering directory '/home/opensaf/opensaf-staging/osaf/services/saf/amf/amfd/tests' CXX testamfd-test_ckpt_enc_dec.o In file included from test_ckpt_enc_dec.cc:22:0: /home/opensaf/googletest/googletest/include/gtest/gtest.h: In instantiation of 'testing::AssertionResult testing::internal::CmpHelperEQ(const char*, const char*, const T1&, const T2&) [with T1 = unsigned int; T2 = int]': /home/opensaf/googletest/googletest/include/gtest/gtest.h:1421:23: required from 'static testing::AssertionResult testing::internal::EqHelper::Compare(const char*, const char*, const T1&, const T2&) [with T1 = unsigned int; T2 = int; bool lhs_is_null_literal = false]' test_ckpt_enc_dec.cc:354:3: required from here /home/opensaf/googletest/googletest/include/gtest/gtest.h:1392:11: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare] if (lhs == rhs) { ^~ /home/opensaf/googletest/googletest/include/gtest/gtest.h: In instantiation of 'testing::AssertionResult testing::internal::CmpHelperEQ(const char*, const char*, const T1&, const T2&) [with T1 = long long unsigned int; T2 = long long int]': /home/opensaf/googletest/googletest/include/gtest/gtest.h:1421:23: required from 'static testing::AssertionResult testing::internal::EqHelper::Compare(const char*, const char*, const T1&, const T2&) [with T1 = long long unsigned int; T2 = long long int; bool lhs_is_null_literal = false]' test_ckpt_enc_dec.cc:362:3: required from here /home/opensaf/googletest/googletest/include/gtest/gtest.h:1392:11: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare] cc1plus: all warnings being treated as errors Makefile:814: recipe for target 'testamfd-test_ckpt_enc_dec.o' failed make[8]: *** [testamfd-test_ckpt_enc_dec.o] Error 1 make[8]: Leaving directory '/home/opensaf/opensaf-staging/osaf/services/saf/amf/amfd/tests' ~~~ --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1987 AMF: Admin operation continuation of nodegroup entity leaves SG unstable
- **status**: review --> fixed - **assigned_to**: Minh Hon Chau --> nobody - **Comment**: changeset: 8059:9eb1e54daa76 tag: tip parent: 8057:8081a9ddd2fc user:minh-chau date:Tue Sep 13 17:59:19 2016 +1000 summary: AMF: Fix SG unstable from admin continuation of nodegroup after headless [#1987] V2 changeset: 8058:a2d3ea8d848f branch: opensaf-5.1.x parent: 8056:280d00e0eba1 user:minh-chau date:Tue Sep 13 17:55:36 2016 +1000 summary: AMF: Fix SG unstable from admin continuation of nodegroup after headless [#1987] V2 --- ** [tickets:#1987] AMF: Admin operation continuation of nodegroup entity leaves SG unstable** **Status:** fixed **Milestone:** 5.1.RC1 **Created:** Wed Aug 31, 2016 12:00 AM UTC by Minh Hon Chau **Last Updated:** Mon Sep 05, 2016 01:58 AM UTC **Owner:** nobody Step to reproduce on nodegroup support 2N - Lock nodegroup - Delay csi quiesced callback - Stop SC - Restart SC - Release cs quiesced callback Observation: SG remain unstable --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets