date:20160913

- **status**: assigned --> wontfix
- **Comment**:

Sep  8 16:56:06.484169 osafimmnd [3620:immnd_proc.c:1679] T5 tmout:1000 ste:10 
ME:5 RE:5 crd:1 rim:FROM_FILE 4.3A:1 2Pbe:0 VetA/B: 0/0 othsc:1/2010f
Sep  8 16:56:08.495410 osafimmnd [3620:ImmModel.cc:14077] T5 Timeout on Search 
continuation 564113889559073
Sep  8 16:56:08.495530 osafimmnd [3620:ImmModel.cc:14200] T5 Timeout on 
sImplDetachTime implid:0
Sep  8 16:56:08.495606 osafimmnd [3620:immnd_proc.c:1236] T5 Timeout on search 
op waiting on 1 implementer(s)
Sep  8 16:56:08.495634 osafimmnd [3620:ImmModel.cc:12830] >> 
fetchSearchReqContinuation
Sep  8 16:56:08.495660 osafimmnd [3620:ImmModel.cc:12838] T5 REQ SEARCH 
CONTINUATION 544 FOUND FOR 564113889559073
Sep  8 16:56:08.495687 osafimmnd [3620:ImmModel.cc:12841] << 
fetchSearchReqContinuation
Sep  8 16:56:08.495708 osafimmnd [3620:immnd_evt.c:1029] >> search_req_continue
Sep  8 16:56:08.495735 osafimmnd [3620:immnd_evt.c:1044] T2 SEARCH NEXT, Look 
for id:545
Sep  8 16:56:08.496293 osafimmnd [3620:immnd_evt.c:1195] TR Finalizing search 
node, err = 5
Sep  8 16:56:08.496337 osafimmnd [3620:ImmModel.cc:1761] TR Deleting iterator 
searchOp 0x7ebcb0
Sep  8 16:56:08.496367 osafimmnd [3620:immnd_evt.c:1375] >> freeSearchNext
Sep  8 16:56:08.496388 osafimmnd [3620:immnd_evt.c:1377] T2 
objectName:DistObj1=DistRunTime
Sep  8 16:56:08.496416 osafimmnd [3620:immnd_evt.c:1412] << freeSearchNext
Sep  8 16:56:08.496434 osafimmnd [3620:immnd_evt.c:1221] << search_req_continue
Sep  8 16:56:08.997941 osafimmnd [3620:immsv_evt.c:5422] T8 Received: 
IMMND_EVT_A2ND_SEARCHNEXT (17) from 2020f
Sep  8 16:56:08.998012 osafimmnd [3620:immnd_evt.c:1498] >> 
immnd_evt_proc_search_next
Sep  8 16:56:08.998028 osafimmnd [3620:immnd_evt.c:1509] T2 SEARCH NEXT, Look 
for id:545
Sep  8 16:56:08.998504 osafimmnd [3620:immnd_evt.c:1520] ER Could not find 
search node for search-ID:545
Sep  8 16:56:08.999722 osafimmnd [3620:immnd_evt.c:1732] << 
immnd_evt_proc_search_next

1. There is timeout on search operation while waiting for runtime attributes 
and the search operation is freed
2. when the search next operation is called again search id is not found and 
BAD_HANDLE is returned.
3. There is no search handle being corrupt.

The code flow has not been chenged from 5.0





---

** [tickets:#2013] IMM: Search Handle getting corrupt when 
saImmOmSearchNext_2() returns ERR_TIMEOUT**

**Status:** wontfix
**Milestone:** 5.1.RC2
**Created:** Thu Sep 08, 2016 12:10 PM UTC by Chani Srivastava
**Last Updated:** Tue Sep 13, 2016 10:06 AM UTC
**Owner:** Neelakanta Reddy
**Attachments:**

- 
[SearchTmOut.zip](https://sourceforge.net/p/opensaf/tickets/2013/attachment/SearchTmOut.zip)
 (883.9 kB; application/zip)


OS : Suse 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 4 nodes

Summary:
Steps to Reproduce
1. Create a runtime/config object
2. Do Search Initiliaze()
3. Delete the object created in Step1
4. Do SearchNext() 
5. Do SearchNext() again 


Observed Bahavior:
Step4 will return SA_AIS_ERR_TIMEOUT (Expected)
Step5 is returning SA_AIS_ERR_BAD_HANDLE** (SA_AIS_ERR_NOT_EXIST is expected)**

**Note: Test passed in OpenSAF release 5.0**

Agent traces and immnd, immd traces attached


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2033 AMF: Update documentation for admin op continuation after headless




---

** [tickets:#2033] AMF: Update documentation for admin op continuation after 
headless**

**Status:** assigned
**Milestone:** 5.1.RC2
**Created:** Wed Sep 14, 2016 06:09 AM UTC by Minh Hon Chau
**Last Updated:** Wed Sep 14, 2016 06:09 AM UTC
**Owner:** Minh Hon Chau


Need to update README/PR doc for #1987 (fixed) and maybe #1988 (review)


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2014 Rebooted controller not detected in TCP

2016-09-13 Thread A V Mahesh (AVM)

- **status**: assigned --> review
- **Milestone**: 4.7.2 --> 5.0.1
- **Comment**:

split-brain is different issue and we have ticket #2030 to debug the  
split-brain case ,
so I  published the patch of this ticket.



---

** [tickets:#2014] Rebooted controller not detected in TCP**

**Status:** review
**Milestone:** 5.0.1
**Created:** Thu Sep 08, 2016 06:20 PM UTC by Jonas Arndt
**Last Updated:** Tue Sep 13, 2016 04:39 PM UTC
**Owner:** A V Mahesh (AVM)
**Attachments:**

- 
[logs.tgz](https://sourceforge.net/p/opensaf/tickets/2014/attachment/logs.tgz) 
(84.1 kB; application/x-compressed-tar)
- 
[tcp_user_timeout_2014.patch](https://sourceforge.net/p/opensaf/tickets/2014/attachment/tcp_user_timeout_2014.patch)
 (5.5 kB; application/octet-stream)


OS environment:

Debian Jessie (OpenSAF is running on bare metal, no containers or VMs)
4.4.7 kernel
Network eth0, bonded, OVS (I have tried all of them and the problem is 
there in all configurations)


In 20% of the cases a "reboot -f" on  controller2 is not detected and acted on. 
What is in the mds.log is .

Sep  7  6:44:23.918566 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:23.918595 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:34.018662 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:34.018751 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:34.018789 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:34.018818 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:44.118832 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:44.118919 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:44.118955 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:44.118984 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:54.218987 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:54.219085 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:54.219139 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:54.219168 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>

Still, there is nothing in the syslog indicating that controller2 has left the 
cluster. This is for TCP.
When the node comes back on line (without opensaf being started) controller 1 
notice finally and fail over apps. 

When the reboot is not detected the tcp keep alives stops and goes into 
retransmits instead. I have attached 2 tshark sessions captured from 
controller1, capturing traffic between controller1 and controller2. The failed 
reboot detect is captured in "ctrl2_failed_detection.trc" and for a working 
detection there is a file "ctrl2_working.trc" I have also attached all logs in 
/var/log/opensaf and the syslog (all from controller one).

It appears to me that we are hitting something similar like 
"http://stackoverflow.com/questions/33553410/tcp-retranmission-timer-overrides-kills-tcp-keepalive-timer-delaying-disconnect";

// Jonas


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2032 ckpt: ckpttest for long dn (5 55, 5 57, 7 12) is failing

2016-09-13 Thread Quyen Dao




---

** [tickets:#2032] ckpt: ckpttest for long dn (5 55, 5 57, 7 12) is failing**

**Status:** unassigned
**Milestone:** 5.1.RC2
**Created:** Wed Sep 14, 2016 04:40 AM UTC by Quyen Dao
**Last Updated:** Wed Sep 14, 2016 04:40 AM UTC
**Owner:** nobody


Changeset: 8064:99410ba8cc21

root@SC-1:~# immcfg -a longDnsAllowed=1 
opensafImm=opensafImm,safApp=safImmService
root@SC-1:~# export SA_ENABLE_EXTENDED_NAMES=1

root@SC-1:~# ckpttest 5 55

Suite 5: CKPT API saCkptCheckpointOpen()
   55  FAILED   To verify creating a ckpt with invalid extended name length 
(expected OUT_OF_RANGE, got SA_AIS_OK (1));

=

   Test Result:
  Total:  1
  Passed: 0
  Failed: 1

root@SC-1:~# ckpttest 5 57

Suite 5: CKPT API saCkptCheckpointOpen()
   57  FAILED   To verify openAsync a ckpt with invalid extended name length 
(expected OUT_OF_RANGE, got SA_AIS_OK (1));

=

   Test Result:
  Total:  1
  Passed: 0
  Failed: 1
root@SC-1:~# ckpttest 7 12

Suite 7: CKPT API saCkptCheckpointUnlink()
   12  FAILED   To test unlink a ckpt with invalid extended name (expected 
OUT_OF_RANGE, got SA_AIS_OK (1));

=

   Test Result:
  Total:  1
  Passed: 0
  Failed: 1
root@SC-1:~#



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1991 AMF: Existing PG tracking should not be stopped for CURRENT flag

2016-09-13 Thread Long HB Nguyen

- **status**: unassigned --> accepted
- **assigned_to**: Long HB Nguyen



---

** [tickets:#1991] AMF: Existing PG tracking should not be stopped  for CURRENT 
flag**

**Status:** accepted
**Milestone:** 5.1.RC2
**Created:** Wed Aug 31, 2016 09:44 AM UTC by Srikanth R
**Last Updated:** Tue Sep 13, 2016 10:09 AM UTC
**Owner:** Long HB Nguyen


5.1.FC : changeset - 6997

Issue : Existing PG tracking should not be stopped  for CURRENT call


Steps performed :

-> Call saAmfInitialize_4()
-> Call saAmfProtectionGroupTrack_4() with SA_TRACK_CURRENT flag.
-> Call saAmfProtectionGroupTrack_4() with SA_TRACK_CHANGES flag.
-> Call saAmfProtectionGroupTrack_4() with SA_TRACK_CURRENT flag.
-> Call saAmfProtectionGroupTrackStop()


Observed output :

TrackStop returns ERR_NOT_EXIST, indicating that tracking is not started 
earlier. 


Expected output:

   TrackStop() api should  return SA_AIS_OK and in the earlier release, api is 
returning SA_AIS_OK.
 
 According to the B04.01 spec 7.11.1 page 318 ,  Tracking should not be stopped 
untill TrackStop() is called explicitly.

Once saAmfProtectionGroupTrack_4() has been called with trackFlags
containing either SA_TRACK_CHANGES or SA_TRACK_CHANGES_ONLY, notification
callbacks can only be stopped by an invocation of
saAmfProtectionGroupTrackStop().



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2030 dtm: "Node already exit in the cluster with smiler configuration"

2016-09-13 Thread A V Mahesh (AVM)

- **status**: unassigned --> assigned
- **assigned_to**: A V Mahesh (AVM)



---

** [tickets:#2030] dtm: "Node already exit in the cluster with smiler 
configuration"**

**Status:** assigned
**Milestone:** 4.7.2
**Created:** Tue Sep 13, 2016 12:10 PM UTC by Anders Widell
**Last Updated:** Tue Sep 13, 2016 01:01 PM UTC
**Owner:** A V Mahesh (AVM)


osafdtm does not handle rapid consecutive node reboots properly. I got the 
following errors in syslog:

~~~
Sep 13 14:00:52 SC-2 local0.err osafdtmd[378]: ER DTM:  Node already exit in 
the cluster with smiler configuration , correct the other joining Node 
configuration 
Sep 13 14:01:02 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed 
.node_ip: 192.168.0.1, node_id: 0
Sep 13 14:01:06 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed 
.node_ip: 192.168.0.1, node_id: 0
~~~

Here are the steps to reproduce this problem in UML:

./opensaf start
(wait until the cluster comes up)
./opensaf nodestop 2
(wait a few seconds)
./opensaf nodestart 2
./opensaf nodestart 2

The last two commands should be execute quickly after each other, maybe with 
one second delay in between them.

It seems that osafdtmd asserts and dies when this happens. Here is the result 
from a second run of the above test:

~~~
Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: ER DTM:  Node already exit in 
the cluster with smiler configuration , correct the other joining Node 
configuration 
Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: dtm_node.c:109: 
dtm_process_node_info: Assertion '0' failed.
Sep 13 14:25:58 SC-2 local0.err osafamfd[478]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafclmna[468]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafclmd[458]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafntfd[448]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osaflogd[437]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafimmnd[426]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafimmd[415]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osaffmd[405]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafrded[392]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'SC-1'
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'PL-4'
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'PL-5'
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'PL-3'
Sep 13 14:25:59 SC-2 user.notice osafdtmd: osafdtmd Process down, Rebooting the 
node
Sep 13 14:25:59 SC-2 user.notice opensaf_reboot: Rebooting local node; 
timeout=60

~~~

Update: it seems I forgot to do "./opensaf nodestop" between the two "./opensaf 
nodestart" above. Thus, there are probably two SC-2 nodes at the same time, and 
the error message "Node already exit in the cluster with smiler configuration" 
should be interpreted as "duplicate node detected in the network". Reducing the 
priority of this defect to "minor". Still two problems ought to be fixed: the 
error message should be changed so that it is clear what it means, and osafdtmd 
should not assert (it could call opensaf_reboot() if a there is a configuration 
problem, but asserting idicates a software problem).




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1895 ntf: ER in syslog: ER NtfAdmin::subscriptionRemoved client 12 not found

2016-09-13 Thread Canh Truong

- **Milestone**: 5.1.RC2 --> 4.7.2



---

** [tickets:#1895] ntf: ER in syslog: ER NtfAdmin::subscriptionRemoved client 
12 not found**

**Status:** review
**Milestone:** 4.7.2
**Created:** Fri Jun 24, 2016 03:24 AM UTC by Vo Minh Hoang
**Last Updated:** Tue Sep 13, 2016 10:11 AM UTC
**Owner:** Canh Truong
**Attachments:**

- 
[osafntfd.txt](https://sourceforge.net/p/opensaf/tickets/1895/attachment/osafntfd.txt)
 (705.5 kB; text/plain)


Failed when run test suit 2: 

osafntfd [463:ntfs_evt.c:0338] >> proc_unsubscribe_msg: client_id 28, 
subscriptionId 111
osafntfd [463:NtfAdmin.cc:0553] ER NtfAdmin::subscriptionRemoved client 28 not 
found
osafntfd [463:ntfs_evt.c:0341] << proc_unsubscribe_msg 

Currently, when finalizing the last client, ntfa uninstall MDS connection.
This causes that the NCSMDS_DOWN event will be sent to ntfs. ntfs will remove 
all clients that relates to this MDS.
But if we initializes new client immediately after finalizing, ntfs may reviece 
the message of initialization before message of NCSMDS_DOWN event. This cause 
new client will be removed without finalizing and then action subcribe failed.


Similiar ticket: https://sourceforge.net/p/opensaf/tickets/1818/


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2031 imm:README files are missing when opensaf is downloaded




---

** [tickets:#2031] imm:README files are missing when opensaf is downloaded**

**Status:** accepted
**Milestone:** 5.0.1
**Created:** Wed Sep 14, 2016 02:40 AM UTC by Neelakanta Reddy
**Last Updated:** Wed Sep 14, 2016 02:40 AM UTC
**Owner:** Neelakanta Reddy


Update the Makefile.am with all README files


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2028 log: write_log_record_hdl get bad file descriptor

- Description has changed:

Diff:



--- old
+++ new
@@ -10,11 +10,11 @@
   TRACE("%s - stream files initiated", __FUNCTION__);
 }
 ```
-In that case - `p_fd = -1`, `log_stream_write_h` should inform the client 
TRY_AGAIN by returning the value `(-2)`.
+In that case - `p_fd = -1`, `log_stream_write_h` should inform the client 
TRY_AGAIN.
 
 Besides, there is an other problem at file closing. Look at the functions 
`fileclose_hdl` and `fileclose_h`.  The file descriptor should be set to 
`invalid` in `fileclose_hdl`,  otherwise `close file` request will re-send to 
the file handle thread even that file is already closed. 
 
-Above cases usually happens when the file sytem is busy.  Osaflogd TRACE:
+Above cases usually happens when the file sytem is busy.  Extract from syslog:
 
 > 2016-07-02 00:32:48 SC-1 osaflogd[460]: NO fileclose failed Device or 
 > resource busy
 > 2016-07-02 00:32:50 SC-1 osaflogd[460]: NO fileclose failed Device or 
 > resource busy






---

** [tickets:#2028] log: write_log_record_hdl get bad file descriptor**

**Status:** review
**Milestone:** 5.0.1
**Created:** Tue Sep 13, 2016 09:54 AM UTC by Vu Minh Nguyen
**Last Updated:** Tue Sep 13, 2016 10:50 AM UTC
**Owner:** Vu Minh Nguyen


In current code, logsv passes the `WRITE REQUEST` to the handle thread even the 
file descriptor is invalid.
Here is some code of log_stream_write_h()@lgs_stream.cc
``` C
log_initiate_stream_files(stream);

if (*stream->p_fd == -1) {
  TRACE("%s - Initiating stream files \"%s\" Failed", __FUNCTION__,
stream->name.c_str());
} else {
  TRACE("%s - stream files initiated", __FUNCTION__);
}
```
In that case - `p_fd = -1`, `log_stream_write_h` should inform the client 
TRY_AGAIN.

Besides, there is an other problem at file closing. Look at the functions 
`fileclose_hdl` and `fileclose_h`.  The file descriptor should be set to 
`invalid` in `fileclose_hdl`,  otherwise `close file` request will re-send to 
the file handle thread even that file is already closed. 

Above cases usually happens when the file sytem is busy.  Extract from syslog:

> 2016-07-02 00:32:48 SC-1 osaflogd[460]: NO fileclose failed Device or 
> resource busy
> 2016-07-02 00:32:50 SC-1 osaflogd[460]: NO fileclose failed Device or 
> resource busy
> 2016-07-02 00:32:52 SC-1 osaflogd[460]: ER write_log_record_hdl - write 
> FAILED: Bad file descriptor





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1988 AMF: Admin operation continuation does not work with short cluster init timeout

- **status**: assigned --> review



---

** [tickets:#1988] AMF: Admin operation continuation does not work with short 
cluster init timeout**

**Status:** review
**Milestone:** 5.1.RC2
**Created:** Wed Aug 31, 2016 12:04 AM UTC by Minh Hon Chau
**Last Updated:** Tue Sep 13, 2016 11:02 AM UTC
**Owner:** Minh Hon Chau


In scenario of admin continuation after headless, if saAmfClusterStartupTimeout 
configures short value, then the admin continuation will initiate when 
saAmfClusterStartupTimeout expires but the SU is still in OUT OF SERVICE. The 
eventual result is failure of admin operation after headless.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2001 IMM: Owner handle is getting corrupt when OmAdminOperationInvoke retruns ERR_TIMEOUT

The syslog, imma and immnd traces are not matching.
Please provide the correct logs


---

** [tickets:#2001] IMM: Owner handle is getting corrupt when 
OmAdminOperationInvoke retruns ERR_TIMEOUT**

**Status:** assigned
**Milestone:** 5.1.RC2
**Created:** Tue Sep 06, 2016 07:14 AM UTC by Chani Srivastava
**Last Updated:** Tue Sep 13, 2016 10:06 AM UTC
**Owner:** Neelakanta Reddy
**Attachments:**

- 
[AdminCbkTmOut.zip](https://sourceforge.net/p/opensaf/tickets/2001/attachment/AdminCbkTmOut.zip)
 (95.1 kB; application/zip)


OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 4 nodes 1 PBE enabled

Summary:
Steps to Reproduce
1. Invoke saImmOmAdminOperationInvokeAsync_2() while waiting in callback with 
time more that OI_CALLBACK_TIMEOUT value
2. Invoke saImmOmAdminOperationInvokeAsync_2() again and do not wait OR Invoke 
any Ccb operation

Observed Bahavior:
Step1 will return SA_AIS_ERR_TIMEOUT (Expected)
Step2 is returning SA_AIS_ERR_BAD_HANDLE (SA_AIS_OK is expected)

Sep  6 12:22:27 SLOT1 python2.5: logtrace: trace enabled to file 
/tmp/imma_oi_callbacktimeout.trace, mask=0x
Sep  6 12:22:27 SLOT1 python2.5: NO IMMA library TRACE initialize done pid:1147 
svid:26 file:/tmp/imma_oi_callbacktimeout.trace
Sep  6 12:22:27 SLOT1 osafimmnd[838]: NO Implementer connected: 14 
(testOiTmout_verifyAdminOpCallback_37) <343, 2010f>
Sep  6 12:22:42 SLOT1 osafimmnd[838]: WA IMMND - Client went down so no response
Sep  6 12:22:42 SLOT1 osafimmnd[838]: WA MDS Send Failed to service:IMMND rc:2
Sep  6 12:22:42 SLOT1 osafimmnd[838]: ER Problem in sending to peer IMMND over 
MDS. Discarding admin op reply.
Sep  6 12:22:42 SLOT1 osafimmnd[838]: WA Error code 2 returned for message type 
21 - ignoring
Sep  6 12:22:47 SLOT1 osafimmnd[838]: WA IMMND - Client 1468878946575 went down 
on syncronous request, discarding request
Sep  6 12:22:47 SLOT1 osafimmnd[838]: NO Implementer locally disconnected. 
Marking it as doomed 14 <343, 2010f> (testOiTmout_verifyAdminOpCallback_37)
Sep  6 12:22:47 SLOT1 osafimmnd[838]: NO Implementer disconnected 14 <343, 
2010f> (testOiTmout_verifyAdminOpCallback_37)


Note: **Test passed in OpenSAF release 5.0**

Agent traces and immnd, immd traces attached


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1997 IMM: immnd fails to update si while bringing up opensaf with 2PBE

- **status**: assigned --> unassigned
- **assigned_to**: Neelakanta Reddy -->  nobody 
- **Component**: imm --> amf
- **Comment**:

Sep  2 16:54:13 SLOT1 osafimmpbed: WA Start prepare for ccb: 
10004/4294967300 towards slave PBE returned: '12' from Immsv
Sep  2 16:54:13 SLOT1 osafimmpbed: WA PBE-A failed to prepare PRTA update 
Ccb:10004/4294967300 towards PBE-B
Sep  2 16:54:13 SLOT1 osafimmpbed: NO 2PBE Error (18) in PRTA update 
(ccbId:10004)
Sep  2 16:54:13 SLOT1 osafimmnd[3632]: WA update of PERSISTENT runtime 
attributes in object 'safSi=NoRed3,safApp=OpenSAF' REVERTED. PBE rc:18
Sep  2 16:54:13 SLOT1 osafamfd[3698]: ER exec: update FAILED 18

2PBE case, both the PBEs in the controller must be up. From the logs only PBE 
at slot1 is up and slot2 is not yet joined the cluster. The RT-update will 
fail, because of slo2 PBE is not available.

From, the AMF perspective, this has to be analayzed or Error can be made as 
Warning for RT-updates.
Sep  2 16:54:13 SLOT1 osafamfd[3698]: ER exec: update FAILED 18



---

** [tickets:#1997] IMM: immnd fails to update si while bringing up opensaf with 
2PBE**

**Status:** unassigned
**Milestone:** 5.1.RC2
**Created:** Fri Sep 02, 2016 11:46 AM UTC by Chani Srivastava
**Last Updated:** Tue Sep 13, 2016 10:08 AM UTC
**Owner:** nobody
**Attachments:**

- 
[LogAMF.zip](https://sourceforge.net/p/opensaf/tickets/1997/attachment/LogAMF.zip)
 (432.4 kB; application/zip)


setup:
Version - OpenSAF 5.1.FC : changeset - 7997
4-Node cluster
2PBE enabled

Bring up opensaf on a controller with 2PBE enable. IMMND throwing error
Attachments: syslog, amfd and immnd traces

Sep  2 16:54:13 SLOT1 osafimmpbed: WA Start prepare for ccb: 
10004/4294967300 towards slave PBE returned: '12' from Immsv
Sep  2 16:54:13 SLOT1 osafimmpbed: WA PBE-A failed to prepare PRTA update 
Ccb:10004/4294967300 towards PBE-B
Sep  2 16:54:13 SLOT1 osafimmpbed: NO 2PBE Error (18) in PRTA update 
(ccbId:10004)
**Sep  2 16:54:13 SLOT1 osafimmnd[3632]: WA update of PERSISTENT runtime 
attributes in object 'safSi=NoRed3,safApp=OpenSAF' REVERTED. PBE rc:18
Sep  2 16:54:13 SLOT1 osafamfd[3698]: ER exec: update FAILED 18**
Sep  2 16:54:14 SLOT1 osafimmnd[3632]: NO PBE-OI established on this SC. 
Dumping incrementally to file imm.db

Note- 1. OpenSAF is successfully started
 2. Issue not seen with 1PBE

Once controller is up, amf-state si gives

safSi=SC-2N,safApp=OpenSAF
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=PARTIALLY_ASSIGNED(3)
safSi=NoRed4,safApp=OpenSAF
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=UNASSIGNED(1)
safSi=NoRed1,safApp=OpenSAF
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=FULLY_ASSIGNED(2)
safSi=NoRed2,safApp=OpenSAF
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=UNASSIGNED(1)
safSi=NoRed3,safApp=OpenSAF
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=UNASSIGNED(1)




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2014 Rebooted controller not detected in TCP

2016-09-13 Thread Jonas Arndt

Anders, it is possible. I am seeing the same entry in my system when I get the 
split-brain.

After I fixed the MAC in OVS the problem went away though.



---

** [tickets:#2014] Rebooted controller not detected in TCP**

**Status:** assigned
**Milestone:** 4.7.2
**Created:** Thu Sep 08, 2016 06:20 PM UTC by Jonas Arndt
**Last Updated:** Tue Sep 13, 2016 12:14 PM UTC
**Owner:** A V Mahesh (AVM)
**Attachments:**

- 
[logs.tgz](https://sourceforge.net/p/opensaf/tickets/2014/attachment/logs.tgz) 
(84.1 kB; application/x-compressed-tar)
- 
[tcp_user_timeout_2014.patch](https://sourceforge.net/p/opensaf/tickets/2014/attachment/tcp_user_timeout_2014.patch)
 (5.5 kB; application/octet-stream)


OS environment:

Debian Jessie (OpenSAF is running on bare metal, no containers or VMs)
4.4.7 kernel
Network eth0, bonded, OVS (I have tried all of them and the problem is 
there in all configurations)


In 20% of the cases a "reboot -f" on  controller2 is not detected and acted on. 
What is in the mds.log is .

Sep  7  6:44:23.918566 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:23.918595 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:34.018662 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:34.018751 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:34.018789 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:34.018818 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:44.118832 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:44.118919 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:44.118955 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:44.118984 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:54.218987 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:54.219085 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:54.219139 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:54.219168 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>

Still, there is nothing in the syslog indicating that controller2 has left the 
cluster. This is for TCP.
When the node comes back on line (without opensaf being started) controller 1 
notice finally and fail over apps. 

When the reboot is not detected the tcp keep alives stops and goes into 
retransmits instead. I have attached 2 tshark sessions captured from 
controller1, capturing traffic between controller1 and controller2. The failed 
reboot detect is captured in "ctrl2_failed_detection.trc" and for a working 
detection there is a file "ctrl2_working.trc" I have also attached all logs in 
/var/log/opensaf and the syslog (all from controller one).

It appears to me that we are hitting something similar like 
"http://stackoverflow.com/questions/33553410/tcp-retranmission-timer-overrides-kills-tcp-keepalive-timer-delaying-disconnect";

// Jonas


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2030 dtm: "Node already exit in the cluster with smiler configuration"

- Description has changed:

Diff:



--- old
+++ new
@@ -39,3 +39,6 @@
 Sep 13 14:25:59 SC-2 user.notice opensaf_reboot: Rebooting local node; 
timeout=60
 
 ~~~
+
+Update: it seems I forgot to do "./opensaf nodestop" between the two 
"./opensaf nodestart" above. Thus, there are probably two SC-2 nodes at the 
same time, and the error message "Node already exit in the cluster with smiler 
configuration" should be interpreted as "duplicate node detected in the 
network". Reducing the priority of this defect to "minor". Still two problems 
ought to be fixed: the error message should be changed so that it is clear what 
it means, and osafdtmd should not assert (it could call opensaf_reboot() if a 
there is a configuration problem, but asserting idicates a software problem).
+



- **Priority**: major --> minor



---

** [tickets:#2030] dtm: "Node already exit in the cluster with smiler 
configuration"**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Tue Sep 13, 2016 12:10 PM UTC by Anders Widell
**Last Updated:** Tue Sep 13, 2016 12:30 PM UTC
**Owner:** nobody


osafdtm does not handle rapid consecutive node reboots properly. I got the 
following errors in syslog:

~~~
Sep 13 14:00:52 SC-2 local0.err osafdtmd[378]: ER DTM:  Node already exit in 
the cluster with smiler configuration , correct the other joining Node 
configuration 
Sep 13 14:01:02 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed 
.node_ip: 192.168.0.1, node_id: 0
Sep 13 14:01:06 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed 
.node_ip: 192.168.0.1, node_id: 0
~~~

Here are the steps to reproduce this problem in UML:

./opensaf start
(wait until the cluster comes up)
./opensaf nodestop 2
(wait a few seconds)
./opensaf nodestart 2
./opensaf nodestart 2

The last two commands should be execute quickly after each other, maybe with 
one second delay in between them.

It seems that osafdtmd asserts and dies when this happens. Here is the result 
from a second run of the above test:

~~~
Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: ER DTM:  Node already exit in 
the cluster with smiler configuration , correct the other joining Node 
configuration 
Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: dtm_node.c:109: 
dtm_process_node_info: Assertion '0' failed.
Sep 13 14:25:58 SC-2 local0.err osafamfd[478]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafclmna[468]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafclmd[458]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafntfd[448]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osaflogd[437]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafimmnd[426]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafimmd[415]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osaffmd[405]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafrded[392]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'SC-1'
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'PL-4'
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'PL-5'
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'PL-3'
Sep 13 14:25:59 SC-2 user.notice osafdtmd: osafdtmd Process down, Rebooting the 
node
Sep 13 14:25:59 SC-2 user.notice opensaf_reboot: Rebooting local node; 
timeout=60

~~~

Update: it seems I forgot to do "./opensaf nodestop" between the two "./opensaf 
nodestart" above. Thus, there are probably two SC-2 nodes at the same time, and 
the error message "Node already exit in the cluster with smiler configuration" 
should be interpreted as "duplicate node detected in the network". Reducing the 
priority of this defect to "minor". Still two problems ought to be fixed: the 
error message should be changed so that it is clear what it means, and osafdtmd 
should not assert (it could call opensaf_reboot() if a there is a configuration 
problem, but asserting idicates a software problem).




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.-

[tickets] [opensaf:tickets] #2030 dtm: "Node already exit in the cluster with smiler configuration"

- Description has changed:

Diff:



--- old
+++ new
@@ -1,9 +1,9 @@
 osafdtm does not handle rapid consecutive node reboots properly. I got the 
following errors in syslog:
 
 ~~~
-var/SC-2/log/messages:Sep 13 14:00:52 SC-2 local0.err osafdtmd[378]: ER DTM:  
Node already exit in the cluster with smiler configuration , correct the other 
joining Node configuration 
-var/SC-2/log/messages:Sep 13 14:01:02 SC-2 local0.err osafdtmd[378]: ER DTM: 
dtm_node_add failed .node_ip: 192.168.0.1, node_id: 0
-var/SC-2/log/messages:Sep 13 14:01:06 SC-2 local0.err osafdtmd[378]: ER DTM: 
dtm_node_add failed .node_ip: 192.168.0.1, node_id: 0
+Sep 13 14:00:52 SC-2 local0.err osafdtmd[378]: ER DTM:  Node already exit in 
the cluster with smiler configuration , correct the other joining Node 
configuration 
+Sep 13 14:01:02 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed 
.node_ip: 192.168.0.1, node_id: 0
+Sep 13 14:01:06 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed 
.node_ip: 192.168.0.1, node_id: 0
 ~~~
 
 Here are the steps to reproduce this problem in UML:
@@ -16,3 +16,26 @@
 ./opensaf nodestart 2
 
 The last two commands should be execute quickly after each other, maybe with 
one second delay in between them.
+
+It seems that osafdtmd asserts and dies when this happens. Here is the result 
from a second run of the above test:
+
+~~~
+Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: ER DTM:  Node already exit in 
the cluster with smiler configuration , correct the other joining Node 
configuration 
+Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: dtm_node.c:109: 
dtm_process_node_info: Assertion '0' failed.
+Sep 13 14:25:58 SC-2 local0.err osafamfd[478]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
+Sep 13 14:25:58 SC-2 local0.err osafclmna[468]: MDTM:SOCKET recd_bytes :0, 
conn lost with dh server, exiting library err :Success
+Sep 13 14:25:58 SC-2 local0.err osafclmd[458]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
+Sep 13 14:25:58 SC-2 local0.err osafntfd[448]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
+Sep 13 14:25:58 SC-2 local0.err osaflogd[437]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
+Sep 13 14:25:58 SC-2 local0.err osafimmnd[426]: MDTM:SOCKET recd_bytes :0, 
conn lost with dh server, exiting library err :Success
+Sep 13 14:25:58 SC-2 local0.err osafimmd[415]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
+Sep 13 14:25:58 SC-2 local0.err osaffmd[405]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
+Sep 13 14:25:58 SC-2 local0.err osafrded[392]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
+Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'SC-1'
+Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'PL-4'
+Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'PL-5'
+Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'PL-3'
+Sep 13 14:25:59 SC-2 user.notice osafdtmd: osafdtmd Process down, Rebooting 
the node
+Sep 13 14:25:59 SC-2 user.notice opensaf_reboot: Rebooting local node; 
timeout=60
+
+~~~






---

** [tickets:#2030] dtm: "Node already exit in the cluster with smiler 
configuration"**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Tue Sep 13, 2016 12:10 PM UTC by Anders Widell
**Last Updated:** Tue Sep 13, 2016 12:17 PM UTC
**Owner:** nobody


osafdtm does not handle rapid consecutive node reboots properly. I got the 
following errors in syslog:

~~~
Sep 13 14:00:52 SC-2 local0.err osafdtmd[378]: ER DTM:  Node already exit in 
the cluster with smiler configuration , correct the other joining Node 
configuration 
Sep 13 14:01:02 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed 
.node_ip: 192.168.0.1, node_id: 0
Sep 13 14:01:06 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed 
.node_ip: 192.168.0.1, node_id: 0
~~~

Here are the steps to reproduce this problem in UML:

./opensaf start
(wait until the cluster comes up)
./opensaf nodestop 2
(wait a few seconds)
./opensaf nodestart 2
./opensaf nodestart 2

The last two commands should be execute quickly after each other, maybe with 
one second delay in between them.

It seems that osafdtmd asserts and dies when this happens. Here is the result 
from a second run of the above test:

~~~
Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: ER DTM:  Node already exit in 
the cluster with smiler configuration , correct the other joining Node 
configuration 
Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: dtm_node.c:109: 
dtm_process_node_info: Assertion '0' failed.
Sep 13 14:25:58 SC-2 local0.err osafamfd[478]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafclmna[

[tickets] [opensaf:tickets] #2030 dtm: "Node already exit in the cluster with smiler configuration"

Needless to say, the error message itself is also faulty here. I suppose "exit" 
should be "exists", and "smiler" should be "similar"? I am just guessing... :-)


---

** [tickets:#2030] dtm: "Node already exit in the cluster with smiler 
configuration"**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Tue Sep 13, 2016 12:10 PM UTC by Anders Widell
**Last Updated:** Tue Sep 13, 2016 12:10 PM UTC
**Owner:** nobody


osafdtm does not handle rapid consecutive node reboots properly. I got the 
following errors in syslog:

~~~
var/SC-2/log/messages:Sep 13 14:00:52 SC-2 local0.err osafdtmd[378]: ER DTM:  
Node already exit in the cluster with smiler configuration , correct the other 
joining Node configuration 
var/SC-2/log/messages:Sep 13 14:01:02 SC-2 local0.err osafdtmd[378]: ER DTM: 
dtm_node_add failed .node_ip: 192.168.0.1, node_id: 0
var/SC-2/log/messages:Sep 13 14:01:06 SC-2 local0.err osafdtmd[378]: ER DTM: 
dtm_node_add failed .node_ip: 192.168.0.1, node_id: 0
~~~

Here are the steps to reproduce this problem in UML:

./opensaf start
(wait until the cluster comes up)
./opensaf nodestop 2
(wait a few seconds)
./opensaf nodestart 2
./opensaf nodestart 2

The last two commands should be execute quickly after each other, maybe with 
one second delay in between them.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2014 Rebooted controller not detected in TCP

Maybe your split-brain problems could be related to the ticket [#2030] that I 
just filed on DTM?


---

** [tickets:#2014] Rebooted controller not detected in TCP**

**Status:** assigned
**Milestone:** 4.7.2
**Created:** Thu Sep 08, 2016 06:20 PM UTC by Jonas Arndt
**Last Updated:** Tue Sep 13, 2016 12:13 PM UTC
**Owner:** A V Mahesh (AVM)
**Attachments:**

- 
[logs.tgz](https://sourceforge.net/p/opensaf/tickets/2014/attachment/logs.tgz) 
(84.1 kB; application/x-compressed-tar)
- 
[tcp_user_timeout_2014.patch](https://sourceforge.net/p/opensaf/tickets/2014/attachment/tcp_user_timeout_2014.patch)
 (5.5 kB; application/octet-stream)


OS environment:

Debian Jessie (OpenSAF is running on bare metal, no containers or VMs)
4.4.7 kernel
Network eth0, bonded, OVS (I have tried all of them and the problem is 
there in all configurations)


In 20% of the cases a "reboot -f" on  controller2 is not detected and acted on. 
What is in the mds.log is .

Sep  7  6:44:23.918566 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:23.918595 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:34.018662 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:34.018751 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:34.018789 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:34.018818 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:44.118832 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:44.118919 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:44.118955 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:44.118984 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:54.218987 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:54.219085 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:54.219139 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:54.219168 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>

Still, there is nothing in the syslog indicating that controller2 has left the 
cluster. This is for TCP.
When the node comes back on line (without opensaf being started) controller 1 
notice finally and fail over apps. 

When the reboot is not detected the tcp keep alives stops and goes into 
retransmits instead. I have attached 2 tshark sessions captured from 
controller1, capturing traffic between controller1 and controller2. The failed 
reboot detect is captured in "ctrl2_failed_detection.trc" and for a working 
detection there is a file "ctrl2_working.trc" I have also attached all logs in 
/var/log/opensaf and the syslog (all from controller one).

It appears to me that we are hitting something similar like 
"http://stackoverflow.com/questions/33553410/tcp-retranmission-timer-overrides-kills-tcp-keepalive-timer-delaying-disconnect";

// Jonas


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2014 Rebooted controller not detected in TCP

2016-09-13 Thread Jonas Arndt

I actually need to do more tests. From the patch's point of view I think it is 
looking good. The split brain seems to be related to that OVS is bringing up 
the port with a new MAC address every time. I have run some tests on eth0 
(without OVS) and not been able to reproduce the split brain. Note that with 
TIPC as a transport the split brain also never happens even with OVS. I will 
run some more tests today and get back with some conclusion.

The split brain is coming after "reboot -f" on controller2 when it tries to 
join the cluster after coming up after the reboot. After that the two 
controllers run next to each other both active and there is no reboot.

The detection of reboot seems to always be there now, so the patch definitely 
fixed that. 


---

** [tickets:#2014] Rebooted controller not detected in TCP**

**Status:** assigned
**Milestone:** 4.7.2
**Created:** Thu Sep 08, 2016 06:20 PM UTC by Jonas Arndt
**Last Updated:** Tue Sep 13, 2016 04:25 AM UTC
**Owner:** A V Mahesh (AVM)
**Attachments:**

- 
[logs.tgz](https://sourceforge.net/p/opensaf/tickets/2014/attachment/logs.tgz) 
(84.1 kB; application/x-compressed-tar)
- 
[tcp_user_timeout_2014.patch](https://sourceforge.net/p/opensaf/tickets/2014/attachment/tcp_user_timeout_2014.patch)
 (5.5 kB; application/octet-stream)


OS environment:

Debian Jessie (OpenSAF is running on bare metal, no containers or VMs)
4.4.7 kernel
Network eth0, bonded, OVS (I have tried all of them and the problem is 
there in all configurations)


In 20% of the cases a "reboot -f" on  controller2 is not detected and acted on. 
What is in the mds.log is .

Sep  7  6:44:23.918566 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:23.918595 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:34.018662 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:34.018751 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:34.018789 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:34.018818 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:44.118832 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:44.118919 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:44.118955 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:44.118984 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>
Sep  7  6:44:54.218987 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Sep  7  6:44:54.219085 osafamfd[41365] ERR  |MDS_SND_RCV: Timeout occured on 
red sndrsp message from svc_id = MBCSV(19), to svc_id = MBCSV(19)
Sep  7  6:44:54.219139 osafamfd[41365] ERR  |MDS_SND_RCV: Adest=<0x,1>
Sep  7  6:44:54.219168 osafamfd[41365] ERR  |MDS_SND_RCV: 
Anchor=<0x0002020f,1790>

Still, there is nothing in the syslog indicating that controller2 has left the 
cluster. This is for TCP.
When the node comes back on line (without opensaf being started) controller 1 
notice finally and fail over apps. 

When the reboot is not detected the tcp keep alives stops and goes into 
retransmits instead. I have attached 2 tshark sessions captured from 
controller1, capturing traffic between controller1 and controller2. The failed 
reboot detect is captured in "ctrl2_failed_detection.trc" and for a working 
detection there is a file "ctrl2_working.trc" I have also attached all logs in 
/var/log/opensaf and the syslog (all from controller one).

It appears to me that we are hitting something similar like 
"http://stackoverflow.com/questions/33553410/tcp-retranmission-timer-overrides-kills-tcp-keepalive-timer-delaying-disconnect";

// Jonas


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2030 dtm: "Node already exit in the cluster with smiler configuration"




---

** [tickets:#2030] dtm: "Node already exit in the cluster with smiler 
configuration"**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Tue Sep 13, 2016 12:10 PM UTC by Anders Widell
**Last Updated:** Tue Sep 13, 2016 12:10 PM UTC
**Owner:** nobody


osafdtm does not handle rapid consecutive node reboots properly. I got the 
following errors in syslog:

~~~
var/SC-2/log/messages:Sep 13 14:00:52 SC-2 local0.err osafdtmd[378]: ER DTM:  
Node already exit in the cluster with smiler configuration , correct the other 
joining Node configuration 
var/SC-2/log/messages:Sep 13 14:01:02 SC-2 local0.err osafdtmd[378]: ER DTM: 
dtm_node_add failed .node_ip: 192.168.0.1, node_id: 0
var/SC-2/log/messages:Sep 13 14:01:06 SC-2 local0.err osafdtmd[378]: ER DTM: 
dtm_node_add failed .node_ip: 192.168.0.1, node_id: 0
~~~

Here are the steps to reproduce this problem in UML:

./opensaf start
(wait until the cluster comes up)
./opensaf nodestop 2
(wait a few seconds)
./opensaf nodestart 2
./opensaf nodestart 2

The last two commands should be execute quickly after each other, maybe with 
one second delay in between them.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1816 IMM: saImmOiAugmentCcbInitialize returned ERR_TRY_AGAIN when ERR_LIBRARY was expected

- **status**: accepted --> review



---

** [tickets:#1816] IMM: saImmOiAugmentCcbInitialize returned ERR_TRY_AGAIN when 
ERR_LIBRARY was expected**

**Status:** review
**Milestone:** 4.7.2
**Created:** Mon May 09, 2016 07:27 AM UTC by Chani Srivastava
**Last Updated:** Mon May 09, 2016 07:29 AM UTC
**Owner:** Neelakanta Reddy


This was found as part of validating ticket #1808

Code snippet: imma_oi_api.c:3749

~~~
 if(immsv_om_handle_initialize) {/*This is always the first immsv_om_ call */
rc = immsv_om_handle_initialize(&privateOmHandle, 
&version);
} else {
TRACE("ERR_LIBRARY: Error in library linkage. 
libSaImmOm.so is not linked");
rc = SA_AIS_ERR_LIBRARY;
}

if(rc != SA_AIS_OK) {
TRACE("ERR_TRY_AGAIN: failed to obtain internal om 
handle rc:%u", rc);
rc = SA_AIS_ERR_TRY_AGAIN;
goto lock_fail; /* We are not locked and nothing to 
de-allocate.  */
}
~~~

When rc is set to SA_AIS_ERR_LIBRARY, there is no goto and hence next if 
condition is executed which sets rc SA_AIS_ERR_TRY_AGAIN



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1986 log: logtest fails when run after immomtest

- **status**: accepted --> review



---

** [tickets:#1986] log: logtest fails when run after immomtest**

**Status:** review
**Milestone:** 5.1.RC2
**Created:** Tue Aug 30, 2016 09:03 AM UTC by Anders Widell
**Last Updated:** Tue Sep 13, 2016 10:10 AM UTC
**Owner:** Vu Minh Nguyen


If I first run immomtest and then logtest, I get the following result:

~~~

Suite 1: Library Life Cycle
1  PASSED   saLogInitialize() OK[0m;
2  PASSED   saLogInitialize() with NULL pointer to handle[0m;
3  PASSED   saLogInitialize() with NULL pointer to callbacks[0m;
4  PASSED   saLogInitialize() with NULL callbacks AND version[0m;
5  PASSED   saLogInitialize() with uninitialized handle[0m;
6  PASSED   saLogInitialize() with uninitialized version[0m;
7  PASSED   saLogInitialize() with too high release level[0m;
8  PASSED   saLogInitialize() with minor version set to 1[0m;
9  PASSED   saLogInitialize() with major version set to 3[0m;
   10  PASSED   saLogSelectionObjectGet() OK[0m;
   11  PASSED   saLogSelectionObjectGet() with NULL log handle[0m;
   12  PASSED   saLogDispatch() OK[0m;
   13  PASSED   saLogFinalize() OK[0m;
   14  PASSED   saLogFinalize() with NULL log handle[0m;

Suite 2: Log Service Operations
1  PASSED   saLogStreamOpen_2() system stream OK[0m;
2  PASSED   saLogStreamOpen_2() notification stream OK[0m;
3  PASSED   saLogStreamOpen_2() alarm stream OK[0m;
4  PASSED   Create app stream OK[0m;
5  PASSED   Create and open app stream[0m;
6  PASSED   saLogStreamOpen_2() - NULL ptr to handle[0m;
7  PASSED   saLogStreamOpen_2() - NULL logStreamName[0m;
8  PASSED   Open app stream second time with altered logFileName[0m;
9  PASSED   Open app stream second time with altered logFilePathName[0m;
   10  PASSED   Open app stream second time with altered logFileFmt[0m;
   11  PASSED   Open app stream second time with altered maxLogFileSize[0m;
   12  PASSED   Open app stream second time with altered maxLogRecordSize[0m;
   13  PASSED   Open app stream second time with altered maxFilesRotated[0m;
   14  PASSED   Open app stream second time with altered haProperty[0m;
   15  PASSED   Open app with logFileFmt == NULL[0m;
   16  PASSED   Open app stream second time with logFileFmt == NULL[0m;
   17  PASSED   Open app stream with NULL logFilePathName[0m;
   18  PASSED   Open app stream with '.' logFilePathName[0m;
   19  PASSED   Open app stream with invalid logFileFmt[0m;
   20  PASSED   Open app stream with unsupported logFullAction[0m;
   21  PASSED   Open non exist app stream with NULL create attrs[0m;
   22  PASSED   saLogStreamOpenAsync_2(), Not supported[0m;
   23  PASSED   saLogStreamOpenCallbackT() OK[0m;
   24  PASSED   saLogWriteLog(), Not supported[0m;
   25  PASSED   saLogWriteAsyncLog() system OK[0m;
   26  PASSED   saLogWriteAsyncLog() alarm OK[0m;
   27  PASSED   saLogWriteAsyncLog() notification OK[0m;
   28  PASSED   saLogWriteAsyncLog() with NULL logStreamHandle[0m;
   29  PASSED   saLogWriteAsyncLog() with invalid logStreamHandle[0m;
   30  PASSED   saLogWriteAsyncLog() with invalid ackFlags[0m;
   31  PASSED   saLogWriteAsyncLog() with NULL logRecord ptr[0m;
   32  PASSED   saLogWriteAsyncLog() logSvcUsrName == NULL[0m;
   33  PASSED   saLogWriteAsyncLog() logSvcUsrName == NULL and envset[0m;
   34  PASSED   saLogWriteAsyncLog() with logTimeStamp set[0m;
   35  PASSED   saLogWriteAsyncLog() without logTimeStamp set[0m;
   36  PASSED   saLogWriteAsyncLog() 1800 bytes logrecord (ticket #203)[0m;
   37  PASSED   saLogWriteAsyncLog() invalid severity[0m;
   38  PASSED   saLogWriteLogAsync() logBufSize > strlen(logBuf) + 1[0m;
   39  PASSED   saLogWriteLogAsync() logBufSize > SA_LOG_MAX_RECORD_SIZE[0m;
   40  PASSED   saLogWriteLogCallbackT() SA_DISPATCH_ONE[0m;
   41  PASSED   saLogWriteLogCallbackT() SA_DISPATCH_ALL[0m;
   42  PASSED   saLogFilterSetCallbackT OK[0m;
   43  PASSED   saLogStreamClose OK[0m;
   44  PASSED   saLogStreamOpen_2 with maxFilesRotated = 0, ERR[0m;
   45  PASSED   saLogStreamOpen_2 with maxFilesRotated = 128, ERR[0m;
   46  PASSED   saLogStreamOpen_2 with logFileName > 218 characters, ERR[0m;
   47  PASSED   saLogStreamOpen_2 with invalid filename[0m;
   48  PASSED   saLogStreamOpen_2 with maxLogRecordSize > MAX_RECSIZE, ERR[0m;
   49  PASSED   saLogStreamOpen_2 with maxLogRecordSize < 150, ERR[0m;
   50  PASSED   saLogStreamOpen_2 with stream number out of the limitation, 
ERR[0m;
   51  PASSED   saLogInitialize() then saLogFinalize() multiple times. keep MDS 
connection, OK[0m;
   52  PASSED   saLogInitialize() then saLogFinalize() multiple times in 
multiple threads, OK[0m;

Suite 3: Limit Fetch API
1  PASSED   saLogLimitGet(), Not supported[0m;

Suite 4: LOG OI tests, stream objects
1  PASSED   CCB Object Modify saLogStreamFileName[0m;
2  PASSED   CCB Object Modify saLogStreamPathName, ERR not allowed[0m;
3  PASSED   CCB Object

[tickets] [opensaf:tickets] #1990 AMF : Extra notification is received for lock operation on unlocked SG.

2016-09-13 Thread Praveen

- **status**: unassigned --> accepted
- **assigned_to**: Praveen
- **Part**: - --> d
- **Milestone**: 5.1.RC2 --> 4.7.2
- **Comment**:

Analysis:
Excpet 2N problem exists for all other red models. In this models code when 
removal response for last SU comes then there there will not be any SU in the 
oper list. Based on this AMFD again tries to mark the SG locked and this 
results in extra notification.




---

** [tickets:#1990] AMF :  Extra notification is received for lock operation on 
unlocked SG.**

**Status:** accepted
**Milestone:** 4.7.2
**Created:** Wed Aug 31, 2016 06:40 AM UTC by Srikanth R
**Last Updated:** Tue Sep 13, 2016 10:10 AM UTC
**Owner:** Praveen


Changeset : 5.1 FC (7997 changeset)

 Extra notification is received for lock operation on unlocked SG.
 
 amf-adm lock safSg=AmfDemo,safApp=AmfDemo
===  Aug 30 15:22:27 - State Change  ===
eventType = SA_NTF_OBJECT_STATE_CHANGE
notificationObject = "safSg=AmfDemo,safApp=AmfDemo"
notifyingObject = "safApp=safAmfService"
notificationClassId = SA_NTF_VENDOR_ID_SAF.SA_SVC_AMF.103 (0x67)
additionalText = "Admin state of safSg=AmfDemo,safApp=AmfDemo changed"
sourceIndicator = SA_NTF_MANAGEMENT_OPERATION
State ID = SA_AMF_ADMIN_STATE
Old State: SA_AMF_ADMIN_UNLOCKED
New State: SA_AMF_ADMIN_LOCKED

===  Aug 30 15:22:27 - State Change  ===
eventType = SA_NTF_OBJECT_STATE_CHANGE
notificationObject = "safSg=AmfDemo,safApp=AmfDemo"
notifyingObject = "safApp=safAmfService"
notificationClassId = SA_NTF_VENDOR_ID_SAF.SA_SVC_AMF.103 (0x67)
additionalText = "Admin state of safSg=AmfDemo,safApp=AmfDemo changed"
sourceIndicator = SA_NTF_MANAGEMENT_OPERATION
State ID = SA_AMF_ADMIN_STATE
Old State: SA_AMF_ADMIN_LOCKED
New State: SA_AMF_ADMIN_LOCKED



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1969 smf: One step upgrade with cluster reboot does not wait for nodes to start

2016-09-13 Thread elunlen

I think a separate AMF ticket should be written for the AMF part of this 
problem. However even if the AMF problem is solved I think SMF shall be fixed 
to handle this in a better way e.g. by having a configurable time out for 
waiting for nodes.


---

** [tickets:#1969] smf: One step upgrade with cluster reboot does not wait for 
nodes to start**

**Status:** unassigned
**Milestone:** 5.0.1
**Created:** Wed Aug 24, 2016 01:01 PM UTC by elunlen
**Last Updated:** Fri Sep 09, 2016 12:38 PM UTC
**Owner:** nobody


When using the one step upgrade feature with a cluster reboot all nodes will 
restart including the SC-nodes. This is done as the last action in the upgrade 
step. After the active SC-node is up again SMF will continue with the procedure 
wrapup. When collecting information in order to prepare the wrapup the node 
destination for all nodes in the campaign is requested. However this 
information can only be collected from nodes that are started and has joined 
the cluster (unlocked).
The problem is that SMF does not seems wait in order to give all nodes a chance 
to join the cluster and if SMF fails to get node destination from any of the 
nodes the campaign will fail as seen in the log below. When reading node 
destination there is a 10 sec “try again” loop waiting for “node up” for each 
node. It is not unlikely that the active SC-node comes up before some of the 
other nodes and that it will take more than 10 sec after that before some of 
the other nodes joins the cluster. If that's the case the campaign will fail


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1924 PLM: resurrect PLM test suite

- **Milestone**: 4.7.2 --> 5.1.FC



---

** [tickets:#1924] PLM: resurrect PLM test suite**

**Status:** fixed
**Milestone:** 5.1.FC
**Created:** Wed Jul 20, 2016 08:20 PM UTC by Alex Jones
**Last Updated:** Thu Aug 04, 2016 11:23 AM UTC
**Owner:** Alex Jones


The PLM test suite is currently removed from the build because it doesn't 
compile. It can't even be run because it needs specific OpenHPI and IMM 
configuration.

This ticket aims to resurrect the PLM test suite.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1994 IMMSv: Finalized CCB are counted under Max Ccb Limit

- **status**: accepted --> review
- **Milestone**: 5.1.RC2 --> 5.1.RC1



---

** [tickets:#1994] IMMSv: Finalized CCB are counted under Max Ccb Limit**

**Status:** review
**Milestone:** 5.1.RC1
**Created:** Thu Sep 01, 2016 12:32 PM UTC by Chani Srivastava
**Last Updated:** Tue Sep 13, 2016 10:09 AM UTC
**Owner:** Neelakanta Reddy


setup:
Version - OpenSAF 5.1.FC : changeset - 7997
4-Node cluster
1PBE with 30K objects

- Default maxCcb is configured to 1 as in object 
opensafImm=opensafImm,safApp=safImmService
- Try creating more than 1 Ccb operations
~~~
for (( i = 1 ; i <=2; i++))
   immcfg -c TestClass testClass=$i 
~~~
Above operation fails with ERR_NO_RESOURCE after the Ccb count for cluster 
reached 1. Even when a max limit is reached; after few minutes more Ccbs 
are allowed. See the below syslog snippet



Sep  1 14:58:35 OSAF-SC1 osafimmnd[27298]: NO Ccb 45008 COMMITTED 
(chaniTestClass)
Sep  1 14:58:35 OSAF-SC1 osafimmnd[27298]: NO Ccb 45009 COMMITTED 
(chaniTestClass)
Sep  1 14:58:35 OSAF-SC1 osafimmnd[27298]: NO Ccb 45010 COMMITTED 
(chaniTestClass)
Sep  1 14:58:35 OSAF-SC1 osafimmnd[27298]: NO Ccb 45011 COMMITTED 
(chaniTestClass)
Sep  1 14:58:35 OSAF-SC1 osafimmnd[27298]: NO Ccb 45012 COMMITTED 
(chaniTestClass)
**Sep  1 *14:58:35* OSAF-SC1 osafimmnd[27298]: *NO ERR_NO_RESOURCES: maximum 
Ccbs limit 2 has been reached for the cluster***
Sep  1 15:00:34 OSAF-SC1 syslog-ng[1194]: Log statistics; 
dropped='pipe(/dev/xconsole)=0', dropped='pipe(/dev/tty10)=0', 
processed='center(queued)=92951', processed='center(received)=47084', 
processed='destination(messages)=47077', processed='destination(mailinfo)=7', 
processed='destination(mailwarn)=0', 
processed='destination(localmessages)=45786', 
processed='destination(newserr)=0', processed='destination(mailerr)=0', 
processed='destination(netmgm)=0', processed='destination(warn)=42', 
processed='destination(console)=16', processed='destination(null)=0', 
processed='destination(mail)=7', processed='destination(xconsole)=16', 
processed='destination(firewall)=0', processed='destination(acpid)=0', 
processed='destination(newscrit)=0', processed='destination(newsnotice)=0', 
processed='source(src)=47084'
**Sep  1 *15:10:14 *OSAF-SC1 osafimmnd[27298]: *NO Ccb 45014 COMMITTED 
(chaniTestClass)***
Sep  1 15:10:14 OSAF-SC1 osafimmnd[27298]: NO Ccb 45015 COMMITTED 
(chaniTestClass)
Sep  1 15:10:14 OSAF-SC1 osafimmnd[27298]: NO Ccb 45016 COMMITTED 
(chaniTestClass)
Sep  1 15:10:14 OSAF-SC1 osafimmnd[27298]: NO Ccb 45017 COMMITTED 
(chaniTestClass)



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2029 imm: fevs message lost during failover

2016-09-13 Thread Hung Nguyen




---

** [tickets:#2029] imm: fevs message lost during failover**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Tue Sep 13, 2016 11:05 AM UTC by Hung Nguyen
**Last Updated:** Tue Sep 13, 2016 11:05 AM UTC
**Owner:** nobody
**Attachments:**

- [logs.7z](https://sourceforge.net/p/opensaf/tickets/2029/attachment/logs.7z) 
(256.4 kB; application/octet-stream)


There's fevs message loss when failing over between 2 SCs.


~~~
Sep  8 11:50:00 SC-2-1 osafimmnd[4241]: NO Implementer locally disconnected. 
Marking it as doomed 232 <754, 2010f> (@OpenSafImmPBE)
Sep  8 11:50:00 SC-2-1 osafimmnd[4241]: NO Implementer locally disconnected. 
Marking it as doomed 233 <755, 2010f> (OsafImmPbeRt_B)
...
Sep  8 11:50:00 SC-2-1 osafimmnd[4241]: NO Implementer disconnected 233 <755, 
2010f> (OsafImmPbeRt_B)
~~~


The IMMNDs never receive the D2ND_DISCARD_IMPL for @OpenSafImmPBE, so that 
applier keeps being mark as dying


~~~
Sep  8 11:50:02 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports 
missing PbeBSlave locally => unsafe
Sep  8 11:50:03 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports 
missing PbeBSlave locally => unsafe
Sep  8 11:50:04 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports 
missing PbeBSlave locally => unsafe
...
Sep  8 11:59:08 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports 
missing PbeBSlave locally => unsafe
Sep  8 11:59:09 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports 
missing PbeBSlave locally => unsafe
Sep  8 11:59:10 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports 
missing PbeBSlave locally => unsafe
...
~~~


Details of the problem is explained here


http://sequencediagram.org/index.html?initialData=A4QwTgLglgxloDsIAICSBZdARAjAKFElnhCTUywCYDxo5EUN0A5LAZzzzACMB7AD2S8AbgFMw5bDgA0TKgC45OecgDKAFQCCrAEIBNZAFpJC5JoDC61ADUAogB0EjgGajhbZDF4BXJOORQHjgADAAsAMwAbJxMrB6GAMRgogAmAHwmlCqu7gD6ALaibGwgAOainDwCQmISclk5Hl6+EP6ByCERAOyVfIIi-vXyAFSjbKIIKcj53DDTbKXIzrwSjfOLneFdjtzeKFAoXoUeELwmOMgANiCtMRSURgncl96iGbHs2W5sBQvIbBAQPlgKlkAB3A4ACw6YS2vWqAzqFGUqghEBg0NOZksNls-0BrUcOz2yEhIDYCAA5ChSrwUBBIaJprN1kswLx8pkiQg2GcGUy1s0-BJ2gCoJdLjCItE8B94klUu9kV88scSuV4f1aud5IKfMKAkFYdsnAhuKIYCBvONzvjxZKyRTqchkjBRFAxFN+cy5vkFtJHAd8UDgCdGUtvqy0dDNibHJoYBBvCAJQBPaTIb1rP2LNiQnyXKbm4PA0HRmHhUIADjuUkez1eSpYnwjqr+AJDZahUrhXD6NUGmDiiiH7ACpQQKyZKW8wEusBuoOzfwAFLGAJTc3mZ8NqspM2Nsjm29qXXgA5AAQivN+vd9vfYR2qUI1GrvdYh9rOWq0jOZ7cYIOo4Z6i0bQeCmyQgCkqYAdyADqmjIOgRTqkyQoQPIh4ANQdFeAC8AF4EAA


~~~
Sep  8 11:50:00 SC-2-1 osafimmd[4226]: WA IMMND DOWN on active controller 2 
detected at standby immd!! 1. Possible failover
...
Sep  8 11:50:00 SC-2-1 osafimmd[4226]: WA Message count:10437 + 1 != 10437
Sep  8 11:50:00 SC-2-1 osafimmnd[4241]: WA DISCARD DUPLICATE FEVS message:10437
Sep  8 11:50:00 SC-2-1 osafimmnd[4241]: WA Error code 2 returned for message 
type 82 - ignoring
~~~


Attached is the logs


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1431 PLM: support virtualization of EEs

- **status**: assigned --> fixed
- **Milestone**: 5.2.FC --> 5.1.FC
- **Comment**:

Closing this ticket so that it appears in the list of OpenSAF 5.1 enhancements. 
Please open a new ticket for OpenSAF 5.1.FC if you wish to continue working on 
this feature.



---

** [tickets:#1431] PLM: support virtualization of EEs**

**Status:** fixed
**Milestone:** 5.1.FC
**Created:** Fri Jul 31, 2015 03:34 PM UTC by Alex Jones
**Last Updated:** Mon Aug 29, 2016 12:07 PM UTC
**Owner:** Alex Jones


This ticket is for adding virtualization support for EEs in PLM.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1988 AMF: Admin operation continuation does not work with short cluster init timeout

- **Milestone**: 5.2.FC --> 5.1.RC2



---

** [tickets:#1988] AMF: Admin operation continuation does not work with short 
cluster init timeout**

**Status:** assigned
**Milestone:** 5.1.RC2
**Created:** Wed Aug 31, 2016 12:04 AM UTC by Minh Hon Chau
**Last Updated:** Mon Sep 12, 2016 01:26 AM UTC
**Owner:** Minh Hon Chau


In scenario of admin continuation after headless, if saAmfClusterStartupTimeout 
configures short value, then the admin continuation will initiate when 
saAmfClusterStartupTimeout expires but the SU is still in OUT OF SERVICE. The 
eventual result is failure of admin operation after headless.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2028 log: write_log_record_hdl get bad file descriptor

- **status**: accepted --> review



---

** [tickets:#2028] log: write_log_record_hdl get bad file descriptor**

**Status:** review
**Milestone:** 5.0.1
**Created:** Tue Sep 13, 2016 09:54 AM UTC by Vu Minh Nguyen
**Last Updated:** Tue Sep 13, 2016 09:54 AM UTC
**Owner:** Vu Minh Nguyen


In current code, logsv passes the `WRITE REQUEST` to the handle thread even the 
file descriptor is invalid.
Here is some code of log_stream_write_h()@lgs_stream.cc
``` C
log_initiate_stream_files(stream);

if (*stream->p_fd == -1) {
  TRACE("%s - Initiating stream files \"%s\" Failed", __FUNCTION__,
stream->name.c_str());
} else {
  TRACE("%s - stream files initiated", __FUNCTION__);
}
```
In that case - `p_fd = -1`, `log_stream_write_h` should inform the client 
TRY_AGAIN by returning the value `(-2)`.

Besides, there is an other problem at file closing. Look at the functions 
`fileclose_hdl` and `fileclose_h`.  The file descriptor should be set to 
`invalid` in `fileclose_hdl`,  otherwise `close file` request will re-send to 
the file handle thread even that file is already closed. 

Above cases usually happens when the file sytem is busy.  Osaflogd TRACE:

> 2016-07-02 00:32:48 SC-1 osaflogd[460]: NO fileclose failed Device or 
> resource busy
> 2016-07-02 00:32:50 SC-1 osaflogd[460]: NO fileclose failed Device or 
> resource busy
> 2016-07-02 00:32:52 SC-1 osaflogd[460]: ER write_log_record_hdl - write 
> FAILED: Bad file descriptor





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1837 TIPC: loading model gives: "osafimmpbed: ER Failed in saImmOmSearchNext_2:5 - exiting" and "osafimmpbed: ER immpbe.cc dumpObjectsToPbe failed - exiting (line:265)

- **Milestone**: 5.1.RC1 --> 5.1.RC2



---

** [tickets:#1837] TIPC: loading model gives: "osafimmpbed: ER Failed in 
saImmOmSearchNext_2:5 - exiting" and "osafimmpbed: ER immpbe.cc 
dumpObjectsToPbe failed - exiting (line:265)**

**Status:** unassigned
**Milestone:** 5.1.RC2
**Created:** Wed May 18, 2016 05:41 AM UTC by beatriz brandao
**Last Updated:** Tue Aug 30, 2016 08:58 AM UTC
**Owner:** nobody
**Attachments:**

- 
[C:\Docs\lixo\osaftestLog-2016-04-19_04-04-26.gz](https://sourceforge.net/p/opensaf/tickets/1837/attachment/C%3A%5CDocs%5Clixo%5CosaftestLog-2016-04-19_04-04-26.gz)
 (1.4 MB; application/x-gzip-compressed)


Testcase:
osaftest.tests.amf.functest.config_changes.test_comptype_attr_chg.Test.test_chg_ct_def_disable_restart
Note: this testcase are run with TIPC enabled.

Testcase starts @:
2016-04-19 03:44:28 INFO - TestCase:setUp Start | 
test_chg_ct_def_disable_restart (osaftest.tests.amf.functest.
config_changes.test_comptype_attr_chg.Test)

Testcase ends @:
2016-04-19 03:45:16 DEBUG: Powered off cluster

First analysis done by Zoran:
>From syslogs, I cannot see what was the problem for causing ERR_TIMEOUT in 
>searchNext().

According to MDS logs, it seems that this might be an MDS problem.

>From MDS logs:
Apr 19  3:44:36.237379 osaflogd[446] NOTIFY  |MDTM: svc up event for svc_id = 
LGA(21), subscri. by svc_id = LGS(20) 
pwe_id=1 Adest = 
Apr 19  3:44:36.238518 osafntfd[461] NOTIFY  |MDTM: svc up event for svc_id = 
NTFA(29), subscri. by svc_id = NTFS(28) 
pwe_id=1 Adest = 
Apr 19  3:44:36.239261 osafclmd[477] NOTIFY  |MDTM: svc up event for svc_id = 
CLMA(35), subscri. by svc_id = CLMS(34) 
pwe_id=1 Adest = 
Apr 19  3:44:38.788267 osaflogd[446] NOTIFY  |MDTM: svc up event for svc_id = 
LGA(21), subscri. by svc_id = LGS(20) 
pwe_id=1 Adest = 
Apr 19  3:44:44.911298 osafimmpbed[453] ERR  |MDS_SND_RCV: Timeout or Error 
occured
Apr 19  3:44:44.912049 osafimmpbed[453] ERR  |MDS_SND_RCV: Timeout occured on 
sndrsp message
Apr 19  3:44:44.912128 osafimmpbed[453] ERR  |MDS_SND_RCV: 
Adest=<0x0002010f,1637493776>
Apr 19  3:44:44.919827 osafimmnd[432] NOTIFY  |MDTM: svc down event for svc_id 
= IMMA_OM(26), subscri. by svc_id = 
IMMND(25) pwe_id=1 Adest = 
Apr 19  3:44:45.413550 osafimmpbed[679] NOTIFY  |BEGIN MDS LOGGING| 
PID= | ARCHW=a|64bit=1

the was no any MDS message between 3:44:38.788267 and 3:44:44.911298.

At 3:44:44.911298, MDS send/receive PBE request was timed out.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1895 ntf: ER in syslog: ER NtfAdmin::subscriptionRemoved client 12 not found

- **Milestone**: 5.1.RC1 --> 5.1.RC2



---

** [tickets:#1895] ntf: ER in syslog: ER NtfAdmin::subscriptionRemoved client 
12 not found**

**Status:** review
**Milestone:** 5.1.RC2
**Created:** Fri Jun 24, 2016 03:24 AM UTC by Vo Minh Hoang
**Last Updated:** Thu Sep 01, 2016 11:33 AM UTC
**Owner:** Canh Truong
**Attachments:**

- 
[osafntfd.txt](https://sourceforge.net/p/opensaf/tickets/1895/attachment/osafntfd.txt)
 (705.5 kB; text/plain)


Failed when run test suit 2: 

osafntfd [463:ntfs_evt.c:0338] >> proc_unsubscribe_msg: client_id 28, 
subscriptionId 111
osafntfd [463:NtfAdmin.cc:0553] ER NtfAdmin::subscriptionRemoved client 28 not 
found
osafntfd [463:ntfs_evt.c:0341] << proc_unsubscribe_msg 

Currently, when finalizing the last client, ntfa uninstall MDS connection.
This causes that the NCSMDS_DOWN event will be sent to ntfs. ntfs will remove 
all clients that relates to this MDS.
But if we initializes new client immediately after finalizing, ntfs may reviece 
the message of initialization before message of NCSMDS_DOWN event. This cause 
new client will be removed without finalizing and then action subcribe failed.


Similiar ticket: https://sourceforge.net/p/opensaf/tickets/1818/


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1929 osaf: Build fails with GCC 6.1.0

- **Milestone**: 5.1.RC1 --> 5.1.RC2



---

** [tickets:#1929] osaf: Build fails with GCC 6.1.0**

**Status:** assigned
**Milestone:** 5.1.RC2
**Created:** Tue Aug 02, 2016 09:21 AM UTC by A V Mahesh (AVM)
**Last Updated:** Tue Aug 30, 2016 08:57 AM UTC
**Owner:** A V Mahesh (AVM)


OpenSAF fails to build with GCC 6.1.0, due to new compiler warnings:
# gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-pc-linux-gnu/6.1.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-6.1.0/configure --prefix=/usr --enable-shared 
--enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu 
--enable-languages=c,c++ --disable-multilib --disable-bootstrap 
--with-system-zlib --with-gmp=/usr/local/gmp-6.1.1 
--with-mpfr=/usr/local/mpfr-3.1.4 --with-mpc=/usr/local/mpc-1.0.3
Thread model: posix
gcc version 6.1.0 (GCC)


make[5]: Entering directory `/avm/opensaf/osaf/tools/safimm/immdump'
g++ -DHAVE_CONFIG_H -I. -I../../../..  -DSA_EXTENDED_NAME_SOURCE 
-I../../../../osaf/libs/saf/include -I../../../../osaf/libs/core/include 
-I../../../../osaf/libs/core/leap/include 
-I../../../../osaf/libs/core/mds/include 
-I../../../../osaf/libs/core/common/include  
-I../../../../osaf/libs/common/immsv/include  -Wall -fno-strict-aliasing 
-Werror -fPIC -D__STDC_FORMAT_MACROS -D_FORTIFY_SOURCE=2 -fstack-protector 
-DINTERNAL_VERSION_ID='""'  -I/usr/include/libxml2 -g -O2 -MT 
immdump-imm_dumper.o -MD -MP -MF .deps/immdump-imm_dumper.Tpo -c -o 
immdump-imm_dumper.o `test -f 'imm_dumper.cc' || echo './'`imm_dumper.cc
imm_dumper.cc: In function â€˜int main(int, char**)â€™:
imm_dumper.cc:144:5: error: this â€˜ifâ€™ clause does not guard... 
[-Werror=misleading-indentation]
 if ((c = getopt_long(argc, argv, "hp:x:c:", long_options, NULL)) == -1)
 ^~
imm_dumper.cc:147:13: note: ...this statement, but the latter is misleadingly 
indented as if it is guarded by the â€˜ifâ€™
 switch (c) {
 ^~
cc1plus: all warnings being treated as errors
make[5]: *** [immdump-imm_dumper.o] Error 1
make[5]: Leaving directory `/avm/opensaf/osaf/tools/safimm/immdump'
make[4]: *** [all-recursive] Error 1
make[4]: Leaving directory `/avm/opensaf/osaf/tools/safimm'


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1968 SMF does not handle AMF long DN&RDN support

- **Milestone**: 5.1.RC1 --> 5.1.RC2



---

** [tickets:#1968] SMF does not handle AMF long DN&RDN support**

**Status:** review
**Milestone:** 5.1.RC2
**Created:** Wed Aug 24, 2016 12:51 PM UTC by elunlen
**Last Updated:** Mon Sep 12, 2016 01:11 PM UTC
**Owner:** elunlen


SMF already supports long DN. However there are some checks regarding AMF 
related objects that does not allow some DN to be longer than 255 (RDN 64). 
These tests shall be removed since AMF will support long DN from 5.1


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1983 plm: Build failure with gcc 6.1.1 on 32-bit system

- **Milestone**: 5.1.RC1 --> 5.1.RC2



---

** [tickets:#1983] plm: Build failure with gcc 6.1.1 on 32-bit system**

**Status:** unassigned
**Milestone:** 5.1.RC2
**Created:** Mon Aug 29, 2016 08:44 PM UTC by Anders Widell
**Last Updated:** Tue Aug 30, 2016 08:56 AM UTC
**Owner:** nobody


PLM fails to bulild with gcc 6.1.1 on a 32-bit system:

~~~
make[3]: Entering directory '/home/opensaf/opensaf-staging/tests/plmsv/plms'
  CC   plmtest-test_saPlmEntityGroupAdd.o
test_saPlmEntityGroupAdd.c: In function 'saPlmEntityGroupAdd_05':
test_saPlmEntityGroupAdd.c:57:28: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
 rc=saPlmEntityGroupAdd((SaPlmEntityGroupHandleT)&entityGroupHandle, 
&f120_slot_1_dn , entityNamesNumber,SA_PLM_GROUP_SINGLE_ENTITY);
^
cc1: all warnings being treated as errors
Makefile:638: recipe for target 'plmtest-test_saPlmEntityGroupAdd.o' failed
~~~


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1986 log: logtest fails when run after immomtest

- **Milestone**: 5.1.RC1 --> 5.1.RC2



---

** [tickets:#1986] log: logtest fails when run after immomtest**

**Status:** accepted
**Milestone:** 5.1.RC2
**Created:** Tue Aug 30, 2016 09:03 AM UTC by Anders Widell
**Last Updated:** Tue Aug 30, 2016 09:06 AM UTC
**Owner:** Vu Minh Nguyen


If I first run immomtest and then logtest, I get the following result:

~~~

Suite 1: Library Life Cycle
1  PASSED   saLogInitialize() OK[0m;
2  PASSED   saLogInitialize() with NULL pointer to handle[0m;
3  PASSED   saLogInitialize() with NULL pointer to callbacks[0m;
4  PASSED   saLogInitialize() with NULL callbacks AND version[0m;
5  PASSED   saLogInitialize() with uninitialized handle[0m;
6  PASSED   saLogInitialize() with uninitialized version[0m;
7  PASSED   saLogInitialize() with too high release level[0m;
8  PASSED   saLogInitialize() with minor version set to 1[0m;
9  PASSED   saLogInitialize() with major version set to 3[0m;
   10  PASSED   saLogSelectionObjectGet() OK[0m;
   11  PASSED   saLogSelectionObjectGet() with NULL log handle[0m;
   12  PASSED   saLogDispatch() OK[0m;
   13  PASSED   saLogFinalize() OK[0m;
   14  PASSED   saLogFinalize() with NULL log handle[0m;

Suite 2: Log Service Operations
1  PASSED   saLogStreamOpen_2() system stream OK[0m;
2  PASSED   saLogStreamOpen_2() notification stream OK[0m;
3  PASSED   saLogStreamOpen_2() alarm stream OK[0m;
4  PASSED   Create app stream OK[0m;
5  PASSED   Create and open app stream[0m;
6  PASSED   saLogStreamOpen_2() - NULL ptr to handle[0m;
7  PASSED   saLogStreamOpen_2() - NULL logStreamName[0m;
8  PASSED   Open app stream second time with altered logFileName[0m;
9  PASSED   Open app stream second time with altered logFilePathName[0m;
   10  PASSED   Open app stream second time with altered logFileFmt[0m;
   11  PASSED   Open app stream second time with altered maxLogFileSize[0m;
   12  PASSED   Open app stream second time with altered maxLogRecordSize[0m;
   13  PASSED   Open app stream second time with altered maxFilesRotated[0m;
   14  PASSED   Open app stream second time with altered haProperty[0m;
   15  PASSED   Open app with logFileFmt == NULL[0m;
   16  PASSED   Open app stream second time with logFileFmt == NULL[0m;
   17  PASSED   Open app stream with NULL logFilePathName[0m;
   18  PASSED   Open app stream with '.' logFilePathName[0m;
   19  PASSED   Open app stream with invalid logFileFmt[0m;
   20  PASSED   Open app stream with unsupported logFullAction[0m;
   21  PASSED   Open non exist app stream with NULL create attrs[0m;
   22  PASSED   saLogStreamOpenAsync_2(), Not supported[0m;
   23  PASSED   saLogStreamOpenCallbackT() OK[0m;
   24  PASSED   saLogWriteLog(), Not supported[0m;
   25  PASSED   saLogWriteAsyncLog() system OK[0m;
   26  PASSED   saLogWriteAsyncLog() alarm OK[0m;
   27  PASSED   saLogWriteAsyncLog() notification OK[0m;
   28  PASSED   saLogWriteAsyncLog() with NULL logStreamHandle[0m;
   29  PASSED   saLogWriteAsyncLog() with invalid logStreamHandle[0m;
   30  PASSED   saLogWriteAsyncLog() with invalid ackFlags[0m;
   31  PASSED   saLogWriteAsyncLog() with NULL logRecord ptr[0m;
   32  PASSED   saLogWriteAsyncLog() logSvcUsrName == NULL[0m;
   33  PASSED   saLogWriteAsyncLog() logSvcUsrName == NULL and envset[0m;
   34  PASSED   saLogWriteAsyncLog() with logTimeStamp set[0m;
   35  PASSED   saLogWriteAsyncLog() without logTimeStamp set[0m;
   36  PASSED   saLogWriteAsyncLog() 1800 bytes logrecord (ticket #203)[0m;
   37  PASSED   saLogWriteAsyncLog() invalid severity[0m;
   38  PASSED   saLogWriteLogAsync() logBufSize > strlen(logBuf) + 1[0m;
   39  PASSED   saLogWriteLogAsync() logBufSize > SA_LOG_MAX_RECORD_SIZE[0m;
   40  PASSED   saLogWriteLogCallbackT() SA_DISPATCH_ONE[0m;
   41  PASSED   saLogWriteLogCallbackT() SA_DISPATCH_ALL[0m;
   42  PASSED   saLogFilterSetCallbackT OK[0m;
   43  PASSED   saLogStreamClose OK[0m;
   44  PASSED   saLogStreamOpen_2 with maxFilesRotated = 0, ERR[0m;
   45  PASSED   saLogStreamOpen_2 with maxFilesRotated = 128, ERR[0m;
   46  PASSED   saLogStreamOpen_2 with logFileName > 218 characters, ERR[0m;
   47  PASSED   saLogStreamOpen_2 with invalid filename[0m;
   48  PASSED   saLogStreamOpen_2 with maxLogRecordSize > MAX_RECSIZE, ERR[0m;
   49  PASSED   saLogStreamOpen_2 with maxLogRecordSize < 150, ERR[0m;
   50  PASSED   saLogStreamOpen_2 with stream number out of the limitation, 
ERR[0m;
   51  PASSED   saLogInitialize() then saLogFinalize() multiple times. keep MDS 
connection, OK[0m;
   52  PASSED   saLogInitialize() then saLogFinalize() multiple times in 
multiple threads, OK[0m;

Suite 3: Limit Fetch API
1  PASSED   saLogLimitGet(), Not supported[0m;

Suite 4: LOG OI tests, stream objects
1  PASSED   CCB Object Modify saLogStreamFileName[0m;
2  PASSED   CCB Object Modify saLogStreamPathName, ERR not allowed[0m;
3  PASSED   CCB O

[tickets] [opensaf:tickets] #1990 AMF : Extra notification is received for lock operation on unlocked SG.

- **Milestone**: 5.1.RC1 --> 5.1.RC2



---

** [tickets:#1990] AMF :  Extra notification is received for lock operation on 
unlocked SG.**

**Status:** unassigned
**Milestone:** 5.1.RC2
**Created:** Wed Aug 31, 2016 06:40 AM UTC by Srikanth R
**Last Updated:** Wed Aug 31, 2016 06:40 AM UTC
**Owner:** nobody


Changeset : 5.1 FC (7997 changeset)

 Extra notification is received for lock operation on unlocked SG.
 
 amf-adm lock safSg=AmfDemo,safApp=AmfDemo
===  Aug 30 15:22:27 - State Change  ===
eventType = SA_NTF_OBJECT_STATE_CHANGE
notificationObject = "safSg=AmfDemo,safApp=AmfDemo"
notifyingObject = "safApp=safAmfService"
notificationClassId = SA_NTF_VENDOR_ID_SAF.SA_SVC_AMF.103 (0x67)
additionalText = "Admin state of safSg=AmfDemo,safApp=AmfDemo changed"
sourceIndicator = SA_NTF_MANAGEMENT_OPERATION
State ID = SA_AMF_ADMIN_STATE
Old State: SA_AMF_ADMIN_UNLOCKED
New State: SA_AMF_ADMIN_LOCKED

===  Aug 30 15:22:27 - State Change  ===
eventType = SA_NTF_OBJECT_STATE_CHANGE
notificationObject = "safSg=AmfDemo,safApp=AmfDemo"
notifyingObject = "safApp=safAmfService"
notificationClassId = SA_NTF_VENDOR_ID_SAF.SA_SVC_AMF.103 (0x67)
additionalText = "Admin state of safSg=AmfDemo,safApp=AmfDemo changed"
sourceIndicator = SA_NTF_MANAGEMENT_OPERATION
State ID = SA_AMF_ADMIN_STATE
Old State: SA_AMF_ADMIN_LOCKED
New State: SA_AMF_ADMIN_LOCKED



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1991 AMF: Existing PG tracking should not be stopped for CURRENT flag

- **Milestone**: 5.1.RC1 --> 5.1.RC2



---

** [tickets:#1991] AMF: Existing PG tracking should not be stopped  for CURRENT 
flag**

**Status:** unassigned
**Milestone:** 5.1.RC2
**Created:** Wed Aug 31, 2016 09:44 AM UTC by Srikanth R
**Last Updated:** Wed Aug 31, 2016 09:44 AM UTC
**Owner:** nobody


5.1.FC : changeset - 6997

Issue : Existing PG tracking should not be stopped  for CURRENT call


Steps performed :

-> Call saAmfInitialize_4()
-> Call saAmfProtectionGroupTrack_4() with SA_TRACK_CURRENT flag.
-> Call saAmfProtectionGroupTrack_4() with SA_TRACK_CHANGES flag.
-> Call saAmfProtectionGroupTrack_4() with SA_TRACK_CURRENT flag.
-> Call saAmfProtectionGroupTrackStop()


Observed output :

TrackStop returns ERR_NOT_EXIST, indicating that tracking is not started 
earlier. 


Expected output:

   TrackStop() api should  return SA_AIS_OK and in the earlier release, api is 
returning SA_AIS_OK.
 
 According to the B04.01 spec 7.11.1 page 318 ,  Tracking should not be stopped 
untill TrackStop() is called explicitly.

Once saAmfProtectionGroupTrack_4() has been called with trackFlags
containing either SA_TRACK_CHANGES or SA_TRACK_CHANGES_ONLY, notification
callbacks can only be stopped by an invocation of
saAmfProtectionGroupTrackStop().



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1993 amf: amfnd crashes during su lock if CSI attribute name or value is a long dn.

- **Milestone**: 5.1.RC1 --> 5.1.RC2

---

** [tickets:#1993] amf: amfnd crashes during su lock if CSI attribute name or
value is a long dn.**

**Status:** review
**Milestone:** 5.1.RC2
**Created:** Thu Sep 01, 2016 11:09 AM UTC by Praveen
**Last Updated:** Fri Sep 09, 2016 12:24 PM UTC
**Owner:** Praveen
**Attachments:**

-
[amfnd_crash.tgz](https://sourceforge.net/p/opensaf/tickets/1993/attachment/amfnd_crash.tgz)
(69.4 kB; application/x-compressed)

Configuration:
In the long dn amf demo, add csi attribute for the CSI keeping attribute value
a longdn.
1)Bring the configuration up.
2)Lock the SU.
3)AMFND crashes.

AMFND uses memcpy() and thus works with orignal csi attribute values from
csi_rec.
It frees the memory in avsv_amf_cbk_free() when CSI_SET callback arrives.
During SU lock, it agian tries to free the memory while deleting the record.
At AMFND and AMFD, all SaNameT handling should be done using
osaf_extended_name_alloc() API.

Issue will be applicable in case of messages related to CSI Attribute change
callback also.

---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1997 IMM: immnd fails to update si while bringing up opensaf with 2PBE

- **Milestone**: 5.1.RC1 --> 5.1.RC2



---

** [tickets:#1997] IMM: immnd fails to update si while bringing up opensaf with 
2PBE**

**Status:** assigned
**Milestone:** 5.1.RC2
**Created:** Fri Sep 02, 2016 11:46 AM UTC by Chani Srivastava
**Last Updated:** Tue Sep 13, 2016 01:37 AM UTC
**Owner:** Neelakanta Reddy
**Attachments:**

- 
[LogAMF.zip](https://sourceforge.net/p/opensaf/tickets/1997/attachment/LogAMF.zip)
 (432.4 kB; application/zip)


setup:
Version - OpenSAF 5.1.FC : changeset - 7997
4-Node cluster
2PBE enabled

Bring up opensaf on a controller with 2PBE enable. IMMND throwing error
Attachments: syslog, amfd and immnd traces

Sep  2 16:54:13 SLOT1 osafimmpbed: WA Start prepare for ccb: 
10004/4294967300 towards slave PBE returned: '12' from Immsv
Sep  2 16:54:13 SLOT1 osafimmpbed: WA PBE-A failed to prepare PRTA update 
Ccb:10004/4294967300 towards PBE-B
Sep  2 16:54:13 SLOT1 osafimmpbed: NO 2PBE Error (18) in PRTA update 
(ccbId:10004)
**Sep  2 16:54:13 SLOT1 osafimmnd[3632]: WA update of PERSISTENT runtime 
attributes in object 'safSi=NoRed3,safApp=OpenSAF' REVERTED. PBE rc:18
Sep  2 16:54:13 SLOT1 osafamfd[3698]: ER exec: update FAILED 18**
Sep  2 16:54:14 SLOT1 osafimmnd[3632]: NO PBE-OI established on this SC. 
Dumping incrementally to file imm.db

Note- 1. OpenSAF is successfully started
 2. Issue not seen with 1PBE

Once controller is up, amf-state si gives

safSi=SC-2N,safApp=OpenSAF
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=PARTIALLY_ASSIGNED(3)
safSi=NoRed4,safApp=OpenSAF
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=UNASSIGNED(1)
safSi=NoRed1,safApp=OpenSAF
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=FULLY_ASSIGNED(2)
safSi=NoRed2,safApp=OpenSAF
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=UNASSIGNED(1)
safSi=NoRed3,safApp=OpenSAF
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=UNASSIGNED(1)




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1994 IMMSv: Finalized CCB are counted under Max Ccb Limit

- **Milestone**: 5.1.RC1 --> 5.1.RC2



---

** [tickets:#1994] IMMSv: Finalized CCB are counted under Max Ccb Limit**

**Status:** accepted
**Milestone:** 5.1.RC2
**Created:** Thu Sep 01, 2016 12:32 PM UTC by Chani Srivastava
**Last Updated:** Thu Sep 08, 2016 06:55 AM UTC
**Owner:** Neelakanta Reddy


setup:
Version - OpenSAF 5.1.FC : changeset - 7997
4-Node cluster
1PBE with 30K objects

- Default maxCcb is configured to 1 as in object 
opensafImm=opensafImm,safApp=safImmService
- Try creating more than 1 Ccb operations
~~~
for (( i = 1 ; i <=2; i++))
   immcfg -c TestClass testClass=$i 
~~~
Above operation fails with ERR_NO_RESOURCE after the Ccb count for cluster 
reached 1. Even when a max limit is reached; after few minutes more Ccbs 
are allowed. See the below syslog snippet



Sep  1 14:58:35 OSAF-SC1 osafimmnd[27298]: NO Ccb 45008 COMMITTED 
(chaniTestClass)
Sep  1 14:58:35 OSAF-SC1 osafimmnd[27298]: NO Ccb 45009 COMMITTED 
(chaniTestClass)
Sep  1 14:58:35 OSAF-SC1 osafimmnd[27298]: NO Ccb 45010 COMMITTED 
(chaniTestClass)
Sep  1 14:58:35 OSAF-SC1 osafimmnd[27298]: NO Ccb 45011 COMMITTED 
(chaniTestClass)
Sep  1 14:58:35 OSAF-SC1 osafimmnd[27298]: NO Ccb 45012 COMMITTED 
(chaniTestClass)
**Sep  1 *14:58:35* OSAF-SC1 osafimmnd[27298]: *NO ERR_NO_RESOURCES: maximum 
Ccbs limit 2 has been reached for the cluster***
Sep  1 15:00:34 OSAF-SC1 syslog-ng[1194]: Log statistics; 
dropped='pipe(/dev/xconsole)=0', dropped='pipe(/dev/tty10)=0', 
processed='center(queued)=92951', processed='center(received)=47084', 
processed='destination(messages)=47077', processed='destination(mailinfo)=7', 
processed='destination(mailwarn)=0', 
processed='destination(localmessages)=45786', 
processed='destination(newserr)=0', processed='destination(mailerr)=0', 
processed='destination(netmgm)=0', processed='destination(warn)=42', 
processed='destination(console)=16', processed='destination(null)=0', 
processed='destination(mail)=7', processed='destination(xconsole)=16', 
processed='destination(firewall)=0', processed='destination(acpid)=0', 
processed='destination(newscrit)=0', processed='destination(newsnotice)=0', 
processed='source(src)=47084'
**Sep  1 *15:10:14 *OSAF-SC1 osafimmnd[27298]: *NO Ccb 45014 COMMITTED 
(chaniTestClass)***
Sep  1 15:10:14 OSAF-SC1 osafimmnd[27298]: NO Ccb 45015 COMMITTED 
(chaniTestClass)
Sep  1 15:10:14 OSAF-SC1 osafimmnd[27298]: NO Ccb 45016 COMMITTED 
(chaniTestClass)
Sep  1 15:10:14 OSAF-SC1 osafimmnd[27298]: NO Ccb 45017 COMMITTED 
(chaniTestClass)



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2000 msg: Cluster reset happend due to msgd crashed on both the controller

- **Milestone**: 5.1.RC1 --> 5.1.RC2



---

** [tickets:#2000] msg: Cluster reset happend due to msgd crashed on both the 
controller**

**Status:** fixed
**Milestone:** 5.1.RC2
**Created:** Tue Sep 06, 2016 06:04 AM UTC by Ritu Raj
**Last Updated:** Tue Sep 13, 2016 10:07 AM UTC
**Owner:** A V Mahesh (AVM)
**Attachments:**

- 
[Active_syslog](https://sourceforge.net/p/opensaf/tickets/2000/attachment/Active_syslog)
 (716.7 kB; application/octet-stream)
- 
[Standby_syslog](https://sourceforge.net/p/opensaf/tickets/2000/attachment/Standby_syslog)
 (696.4 kB; application/octet-stream)


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
1PBE enabled with 30K objects )

Summary :
--
Cluster reset happend due to assertion SA_MAX_UNEXTENDED_NAME_LENGTH failed in 
msgd

Steps followed & Observed behaviour
--
1.  Invoked failover 
2.  After, few successful failover, New Active Controller rebooted beacuse of 
Assertion 'length < SA_MAX_UNEXTENDED_NAME_LENGTH' failed in msgd. While 
previous Active joinig the cluster as a Standby Role resulted cluster reset 
happend. 
[Timeline: Sep  6 00:13:02 sofo-s2]

Sep  6 00:13:02 sofo-s2 osafimmd[3985]: NO MDS event from svc_id 24 (change:5, 
dest:13)
Sep  6 00:13:02 sofo-s2 osafmsgd[4145]: osaf_extended_name.c:139: 
osaf_extended_name_length: Assertion 'length < SA_MAX_UNEXTENDED_NAME_LENGTH' 
failed.
Sep  6 00:13:02 sofo-s2 osafamfnd[4046]: NO 
'safComp=MQD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Sep  6 00:13:02 sofo-s2 osafamfnd[4046]: ER 
safComp=MQD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Sep  6 00:13:02 sofo-s2 osafamfnd[4046]: Rebooting OpenSAF NodeId = 131599 EE 
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131599, SupervisionTime = 60
Sep  6 00:13:02 sofo-s2 opensaf_reboot: Rebooting local node; timeout=60

Notes:
1. Syslog attached
2  msgnd & msgd  trace not enabled


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2000 msg: Cluster reset happend due to msgd crashed on both the controller

2016-09-13 Thread A V Mahesh (AVM)

- **status**: review --> fixed
- **Comment**:

changeset:   8064:99410ba8cc21
parent:  8061:da089e8f337c
user:Ramesh 
date:Tue Sep 13 15:01:43 2016 +0530
summary: msg: memset ilist_info and track_info to avoid garbage [#2000]
 
changeset:   8065:019e617955ef
branch:  opensaf-5.1.x
tag: tip
parent:  8063:59a5226122ed
user:Ramesh 
date:Tue Sep 13 15:02:23 2016 +0530
summary: msg: memset ilist_info and track_info to avoid garbage [#2000]



---

** [tickets:#2000] msg: Cluster reset happend due to msgd crashed on both the 
controller**

**Status:** fixed
**Milestone:** 5.1.RC1
**Created:** Tue Sep 06, 2016 06:04 AM UTC by Ritu Raj
**Last Updated:** Tue Sep 13, 2016 06:04 AM UTC
**Owner:** A V Mahesh (AVM)
**Attachments:**

- 
[Active_syslog](https://sourceforge.net/p/opensaf/tickets/2000/attachment/Active_syslog)
 (716.7 kB; application/octet-stream)
- 
[Standby_syslog](https://sourceforge.net/p/opensaf/tickets/2000/attachment/Standby_syslog)
 (696.4 kB; application/octet-stream)


Environment details
--
OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
1PBE enabled with 30K objects )

Summary :
--
Cluster reset happend due to assertion SA_MAX_UNEXTENDED_NAME_LENGTH failed in 
msgd

Steps followed & Observed behaviour
--
1.  Invoked failover 
2.  After, few successful failover, New Active Controller rebooted beacuse of 
Assertion 'length < SA_MAX_UNEXTENDED_NAME_LENGTH' failed in msgd. While 
previous Active joinig the cluster as a Standby Role resulted cluster reset 
happend. 
[Timeline: Sep  6 00:13:02 sofo-s2]

Sep  6 00:13:02 sofo-s2 osafimmd[3985]: NO MDS event from svc_id 24 (change:5, 
dest:13)
Sep  6 00:13:02 sofo-s2 osafmsgd[4145]: osaf_extended_name.c:139: 
osaf_extended_name_length: Assertion 'length < SA_MAX_UNEXTENDED_NAME_LENGTH' 
failed.
Sep  6 00:13:02 sofo-s2 osafamfnd[4046]: NO 
'safComp=MQD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Sep  6 00:13:02 sofo-s2 osafamfnd[4046]: ER 
safComp=MQD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Sep  6 00:13:02 sofo-s2 osafamfnd[4046]: Rebooting OpenSAF NodeId = 131599 EE 
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131599, SupervisionTime = 60
Sep  6 00:13:02 sofo-s2 opensaf_reboot: Rebooting local node; timeout=60

Notes:
1. Syslog attached
2  msgnd & msgd  trace not enabled


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2001 IMM: Owner handle is getting corrupt when OmAdminOperationInvoke retruns ERR_TIMEOUT

- **Milestone**: 5.1.RC1 --> 5.1.RC2



---

** [tickets:#2001] IMM: Owner handle is getting corrupt when 
OmAdminOperationInvoke retruns ERR_TIMEOUT**

**Status:** assigned
**Milestone:** 5.1.RC2
**Created:** Tue Sep 06, 2016 07:14 AM UTC by Chani Srivastava
**Last Updated:** Tue Sep 13, 2016 01:38 AM UTC
**Owner:** Neelakanta Reddy
**Attachments:**

- 
[AdminCbkTmOut.zip](https://sourceforge.net/p/opensaf/tickets/2001/attachment/AdminCbkTmOut.zip)
 (95.1 kB; application/zip)


OS : Suse 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 4 nodes 1 PBE enabled

Summary:
Steps to Reproduce
1. Invoke saImmOmAdminOperationInvokeAsync_2() while waiting in callback with 
time more that OI_CALLBACK_TIMEOUT value
2. Invoke saImmOmAdminOperationInvokeAsync_2() again and do not wait OR Invoke 
any Ccb operation

Observed Bahavior:
Step1 will return SA_AIS_ERR_TIMEOUT (Expected)
Step2 is returning SA_AIS_ERR_BAD_HANDLE (SA_AIS_OK is expected)

Sep  6 12:22:27 SLOT1 python2.5: logtrace: trace enabled to file 
/tmp/imma_oi_callbacktimeout.trace, mask=0x
Sep  6 12:22:27 SLOT1 python2.5: NO IMMA library TRACE initialize done pid:1147 
svid:26 file:/tmp/imma_oi_callbacktimeout.trace
Sep  6 12:22:27 SLOT1 osafimmnd[838]: NO Implementer connected: 14 
(testOiTmout_verifyAdminOpCallback_37) <343, 2010f>
Sep  6 12:22:42 SLOT1 osafimmnd[838]: WA IMMND - Client went down so no response
Sep  6 12:22:42 SLOT1 osafimmnd[838]: WA MDS Send Failed to service:IMMND rc:2
Sep  6 12:22:42 SLOT1 osafimmnd[838]: ER Problem in sending to peer IMMND over 
MDS. Discarding admin op reply.
Sep  6 12:22:42 SLOT1 osafimmnd[838]: WA Error code 2 returned for message type 
21 - ignoring
Sep  6 12:22:47 SLOT1 osafimmnd[838]: WA IMMND - Client 1468878946575 went down 
on syncronous request, discarding request
Sep  6 12:22:47 SLOT1 osafimmnd[838]: NO Implementer locally disconnected. 
Marking it as doomed 14 <343, 2010f> (testOiTmout_verifyAdminOpCallback_37)
Sep  6 12:22:47 SLOT1 osafimmnd[838]: NO Implementer disconnected 14 <343, 
2010f> (testOiTmout_verifyAdminOpCallback_37)


Note: **Test passed in OpenSAF release 5.0**

Agent traces and immnd, immd traces attached


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2013 IMM: Search Handle getting corrupt when saImmOmSearchNext_2() returns ERR_TIMEOUT

- **Milestone**: 5.1.RC1 --> 5.1.RC2



---

** [tickets:#2013] IMM: Search Handle getting corrupt when 
saImmOmSearchNext_2() returns ERR_TIMEOUT**

**Status:** assigned
**Milestone:** 5.1.RC2
**Created:** Thu Sep 08, 2016 12:10 PM UTC by Chani Srivastava
**Last Updated:** Tue Sep 13, 2016 01:38 AM UTC
**Owner:** Neelakanta Reddy
**Attachments:**

- 
[SearchTmOut.zip](https://sourceforge.net/p/opensaf/tickets/2013/attachment/SearchTmOut.zip)
 (883.9 kB; application/zip)


OS : Suse 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 4 nodes

Summary:
Steps to Reproduce
1. Create a runtime/config object
2. Do Search Initiliaze()
3. Delete the object created in Step1
4. Do SearchNext() 
5. Do SearchNext() again 


Observed Bahavior:
Step4 will return SA_AIS_ERR_TIMEOUT (Expected)
Step5 is returning SA_AIS_ERR_BAD_HANDLE** (SA_AIS_ERR_NOT_EXIST is expected)**

**Note: Test passed in OpenSAF release 5.0**

Agent traces and immnd, immd traces attached


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2017 Update the SMF PR document with information about faster upgrade

- **Milestone**: 5.1.RC1 --> 5.1.RC2



---

** [tickets:#2017] Update the SMF PR document with information about faster 
upgrade**

**Status:** accepted
**Milestone:** 5.1.RC2
**Created:** Fri Sep 09, 2016 12:22 PM UTC by elunlen
**Last Updated:** Fri Sep 09, 2016 01:21 PM UTC
**Owner:** elunlen


Update the SMF PR document with information about:
* Balanced In Service Upgrade (BISU) [#1685]
* Parallel swBundle removal and installation [#1633]
* NG lock and unlock in single step upgrade [#1634]


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2024 Imm doc: updattion of IMMsv PR document for 5.1

- **Milestone**: 5.1.RC1 --> 5.1.RC2



---

** [tickets:#2024] Imm doc: updattion of IMMsv PR document for 5.1**

**Status:** accepted
**Milestone:** 5.1.RC2
**Created:** Mon Sep 12, 2016 06:17 AM UTC by Neelakanta Reddy
**Last Updated:** Mon Sep 12, 2016 06:17 AM UTC
**Owner:** Neelakanta Reddy


This defect is to update, IMMsv PR document


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2028 log: write_log_record_hdl get bad file descriptor




---

** [tickets:#2028] log: write_log_record_hdl get bad file descriptor**

**Status:** accepted
**Milestone:** 5.0.1
**Created:** Tue Sep 13, 2016 09:54 AM UTC by Vu Minh Nguyen
**Last Updated:** Tue Sep 13, 2016 09:54 AM UTC
**Owner:** Vu Minh Nguyen


In current code, logsv passes the `WRITE REQUEST` to the handle thread even the 
file descriptor is invalid.
Here is some code of log_stream_write_h()@lgs_stream.cc
``` C
log_initiate_stream_files(stream);

if (*stream->p_fd == -1) {
  TRACE("%s - Initiating stream files \"%s\" Failed", __FUNCTION__,
stream->name.c_str());
} else {
  TRACE("%s - stream files initiated", __FUNCTION__);
}
```
In that case - `p_fd = -1`, `log_stream_write_h` should inform the client 
TRY_AGAIN by returning the value `(-2)`.

Besides, there is an other problem at file closing. Look at the functions 
`fileclose_hdl` and `fileclose_h`.  The file descriptor should be set to 
`invalid` in `fileclose_hdl`,  otherwise `close file` request will re-send to 
the file handle thread even that file is already closed. 

Above cases usually happens when the file sytem is busy.  Osaflogd TRACE:

> 2016-07-02 00:32:48 SC-1 osaflogd[460]: NO fileclose failed Device or 
> resource busy
> 2016-07-02 00:32:50 SC-1 osaflogd[460]: NO fileclose failed Device or 
> resource busy
> 2016-07-02 00:32:52 SC-1 osaflogd[460]: ER write_log_record_hdl - write 
> FAILED: Bad file descriptor





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2019 amf: Unit tests fail to build

- **status**: review --> fixed
- **assigned_to**: Long HB Nguyen -->  nobody 
- **Comment**:

changeset:   8061:da089e8f337c
tag: tip
parent:  8059:9eb1e54daa76
user:Long Nguyen
date:Tue Sep 13 19:12:26 2016 +1000
summary: amf: Unit tests fail to build [#2019]

changeset:   8060:901da236b68c
branch:  opensaf-5.1.x
parent:  8058:a2d3ea8d848f
user:Long Nguyen 
date:Tue Sep 13 19:10:22 2016 +1000
summary: amf: Unit tests fail to build [#2019]




---

** [tickets:#2019] amf: Unit tests fail to build**

**Status:** fixed
**Milestone:** 5.1.RC1
**Created:** Fri Sep 09, 2016 01:08 PM UTC by Anders Widell
**Last Updated:** Tue Sep 13, 2016 04:33 AM UTC
**Owner:** nobody


"make check" fails (32-bit system, GCC version 6.1.1, googletest version 
48ee8e98abc950abd8541e15550b18f8f6cfb3a9):

~~~
make[8]: Entering directory 
'/home/opensaf/opensaf-staging/osaf/services/saf/amf/amfd/tests'
  CXX  testamfd-test_ckpt_enc_dec.o
In file included from test_ckpt_enc_dec.cc:22:0:
/home/opensaf/googletest/googletest/include/gtest/gtest.h: In instantiation of 
'testing::AssertionResult testing::internal::CmpHelperEQ(const char*, const 
char*, const T1&, const T2&) [with T1 = unsigned int; T2 = int]':
/home/opensaf/googletest/googletest/include/gtest/gtest.h:1421:23:   required 
from 'static testing::AssertionResult 
testing::internal::EqHelper::Compare(const char*, const 
char*, const T1&, const T2&) [with T1 = unsigned int; T2 = int; bool 
lhs_is_null_literal = false]'
test_ckpt_enc_dec.cc:354:3:   required from here
/home/opensaf/googletest/googletest/include/gtest/gtest.h:1392:11: error: 
comparison between signed and unsigned integer expressions 
[-Werror=sign-compare]
   if (lhs == rhs) {
   ^~
/home/opensaf/googletest/googletest/include/gtest/gtest.h: In instantiation of 
'testing::AssertionResult testing::internal::CmpHelperEQ(const char*, const 
char*, const T1&, const T2&) [with T1 = long long unsigned int; T2 = long long 
int]':
/home/opensaf/googletest/googletest/include/gtest/gtest.h:1421:23:   required 
from 'static testing::AssertionResult 
testing::internal::EqHelper::Compare(const char*, const 
char*, const T1&, const T2&) [with T1 = long long unsigned int; T2 = long long 
int; bool lhs_is_null_literal = false]'
test_ckpt_enc_dec.cc:362:3:   required from here
/home/opensaf/googletest/googletest/include/gtest/gtest.h:1392:11: error: 
comparison between signed and unsigned integer expressions 
[-Werror=sign-compare]
cc1plus: all warnings being treated as errors
Makefile:814: recipe for target 'testamfd-test_ckpt_enc_dec.o' failed
make[8]: *** [testamfd-test_ckpt_enc_dec.o] Error 1
make[8]: Leaving directory 
'/home/opensaf/opensaf-staging/osaf/services/saf/amf/amfd/tests'
~~~


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1987 AMF: Admin operation continuation of nodegroup entity leaves SG unstable