from:"Anders Bjornerstedt"

[tickets] [opensaf:tickets] #2418 imm: Info of dead IMMND remains in standby IMMD

2017-04-13 Thread Anders Bjornerstedt

I the defect only occurs in a headless system, then I think the ticket slogan, 
or at least the description sholud say so.


---

** [tickets:#2418] imm: Info of dead IMMND remains in standby IMMD**

**Status:** review
**Milestone:** 5.0.2
**Created:** Mon Apr 10, 2017 10:23 AM UTC by Hung Nguyen
**Last Updated:** Thu Apr 13, 2017 09:49 AM UTC
**Owner:** Hung Nguyen
**Attachments:**

- [log.tgz](https://sourceforge.net/p/opensaf/tickets/2418/attachment/log.tgz) 
(149.4 kB; application/x-compressed)


When Standby IMMD is up at the same time with a IMMND exiting, the info of that 
IMMND might not be removed from **immnd_tree** of the Standby IMMD.

Details of the problem is explained in the sequence diagram below
[sequence 
diagram](http://sequencediagram.org/index.html?initialData=A4QwTgLglgxloDsIAICCBhAKgWgJIFl8ARAKFElnhCWQGVMAhPQ0kkAIwHsAPZTgNwCmYOo2bFkAYjCCAJgC5kRAPIB1AHLJBQmgDMwnALbIC+dUT4JkCTrMHIAGiRJdeA4aKamii3AigoxLQAOgh+Acj4DOjI1LLIAM6CgdHIBgA29hCcdBBx7ACezvReLMjYAHxoWOIW8uic6bIJBQgwaYIAjgCuggkQziQYON7lVSW1ig1NLW0dCcCcCEmhEAAW9qbmyOlQ-chQbenddgnI65uE20vWtvZOzhw8fEIiw7VSMgpKapragnoDMYthYbjY7I5nK4Xh53t4ADQTbyKTAbExXCx7DqGdzxfRGaojMrsbooGSGECHM6HTy1IA)

SC-5 was Active, SC-2 was Standby, IMMND on SC-1 was exiting

~~~
18:35:03 SC-1 osafimmnd[441]: exiting for shutdown

18:35:03 SC-2 osafrded[413]: NO RDE role set to STANDBY
18:35:03 SC-2 osafimmd[430]: NO MDS event from svc_id 25 (change:3, 
dest:568511936070075)
18:35:03 SC-2 osafimmd[430]: NO MDS event from svc_id 25 (change:3, 
dest:567412424442298)
18:35:03 SC-2 osafimmd[430]: NO MDS event from svc_id 25 (change:3, 
dest:566312912814523)
18:35:03 SC-2 osafimmd[430]: NO MDS event from svc_id 25 (change:3, 
dest:565213401186744)

18:35:03 SC-5 osafimmd[433]: NO MDS event from svc_id 25 (change:4, 
dest:564113889558969)
~~~

Down event for IMMND@SC-1 was received on SC-5 but not on SC-2.


**The symptoms:**

1. If the down IMMND is the corrdinator, that results in when that Standby IMMD 
becomes Active, it fails to elect new coordinator as there's already a 
coordinator in the **immnd_tree**.
~~~
18:35:11 SC-2 osafimmd[430]: WA IMMND coordinator at 2050f apparently crashed 
=> electing new coord
~~~
No more logs about newly elected coordinator were printed out.


2. When IMMND@SC-1 is up again, it will fail to introduce to IMMD because the 
IMMD already have IMMND@SC-1 in **immnd_tree** with a wrong epoch.

~~~
18:35:29 SC-1 osafimmnd[441]: NO SERVER STATE: IMM_SERVER_ANONYMOUS --> 
IMM_SERVER_CLUSTER_WAITING
18:35:29 SC-1 osafimmnd[441]: NO This IMMND is now the NEW Coord
18:35:29 SC-1 osafimmnd[441]: ER 3 > 0, exiting
~~~




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2398 imm: retry of ccb abort should be allowed if failed with TRY_AGAIN and TIMEOUT

2017-03-27 Thread Anders Bjornerstedt

TRY_AGAIN ok.

But ERR_TIMEOUT ? Not sure what you are  doing here.

The two error codes are different in meaning. 

TRY_AGAIN means the client KNOWS that the call was NOT processed.

TIMEOUT mens the client does NOT KNOW if the call was processed in the server 
or not.
TIMEOUT may be (is typically) generated in the client library when the client 
has waited too long
for a response from the server.



---

** [tickets:#2398] imm: retry of ccb abort should be allowed if failed with 
TRY_AGAIN and TIMEOUT**

**Status:** review
**Milestone:** 5.0.2
**Created:** Mon Mar 27, 2017 07:50 AM UTC by Neelakanta Reddy
**Last Updated:** Mon Mar 27, 2017 08:02 AM UTC
**Owner:** Neelakanta Reddy


steps :
1. create a ccb
2. saImmOmCcbAbort the ccb, the return code should be TRY_AGAIN, which can be 
re-produced when fevs queue is full
T2 Too many pending incoming FEVS messages (> 16) enqueueing async message. 
Backlog:1

The saImmOmCcbAbort ccb will create the imma_newCcbId, without finalizing old 
ccbid.

solution:
do not create new ccbid when the return code is TRY_AGAIN or TIMEOUT


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2393 Immd got crashed on Active as immnd restarted on Active with cluster having single controller and payload

2017-03-23 Thread Anders Bjornerstedt

Note also that the IMMD does not "crash", it exits.

Mar 23 11:06:12 SO-SLOT-1 osafimmd[2138]: ER IMM RELOAD with NO persistent back 
end => ensure cluster restart by IMMD exit at both SCs, exiting


---

** [tickets:#2393] Immd got crashed on Active as immnd restarted on Active with 
cluster having single controller and payload**

**Status:** unassigned
**Milestone:** 5.2.RC2
**Created:** Thu Mar 23, 2017 05:58 AM UTC by Ritu Raj
**Last Updated:** Thu Mar 23, 2017 05:37 PM UTC
**Owner:** nobody
**Attachments:**

- 
[PL-3.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2393/attachment/PL-3.tar.bz2)
 (558.9 kB; application/x-bzip)
- 
[SC-1.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2393/attachment/SC-1.tar.bz2)
 (2.5 MB; application/x-bzip)


###Environment details
OS : Suse 64bit
Changeset : 8701 ( 5.2.RC1)
2 nodes setup(1 controller and 1 payload)

###Summary
Immd got crashed on Active as immnd restarted on Active with cluster having 
single controller and payload

###Steps followed & Observed behaviour
1. Bring up cluster wtih 1 controller and 1 payload 
2. Kill immnd on active controller 
3. Observed, that immd got crashed on Active controller(SC-1) due to which 
Payload also got rebooted

** Issue obserbed when there is only one controller **

**Syslog**
SC-1:::

Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: NO 
'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' component restart probation timer 
started (timeout: 600 ns)
Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: NO Restarting a component of 
'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1)
Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: NO 
'safComp=IMMND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' 
: Recovery is 'componentRestart'
Mar 23 11:06:12 SO-SLOT-1 osafsmfd[2235]: WA DispatchOiCallback: 
saImmOiDispatch() Fail 'SA_AIS_ERR_BAD_HANDLE (9)'
Mar 23 11:06:12 SO-SLOT-1 osafntfimcnd[2181]: NO saImmOiDispatch() Fail 
SA_AIS_ERR_BAD_HANDLE (9)
Mar 23 11:06:12 SO-SLOT-1 osafimmd[2138]: WA IMMND coordinator at 2010f 
apparently crashed => electing new coord
Mar 23 11:06:12 SO-SLOT-1 osafimmd[2138]: ER Failed to find candidate for new 
IMMND coordinator (ScAbsenceAllowed:0 RulingEpoch:2
Mar 23 11:06:12 SO-SLOT-1 osafimmd[2138]: ER Active IMMD has to restart the 
IMMSv. All IMMNDs will restart
Mar 23 11:06:12 SO-SLOT-1 osafimmd[2138]: ER IMM RELOAD with NO persistent back 
end => ensure cluster restart by IMMD exit at both SCs, exiting
Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: NO 
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: ER 
safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: Rebooting OpenSAF NodeId = 131343 EE 
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131343, SupervisionTime = 60
Mar 23 11:06:12 SO-SLOT-1 opensaf_reboot: Rebooting local node; timeout=60

PL-3:::
Mar 23 11:06:21 SO-SLOT-3 osafimmnd[2280]: ER IMMND forced to restart on order 
from IMMD, exiting
Mar 23 11:06:21 SO-SLOT-3 osafamfnd[2290]: NO 
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' component restart probation timer 
started (timeout: 600 ns)
Mar 23 11:06:21 SO-SLOT-3 osafamfnd[2290]: NO Restarting a component of 
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1)
Mar 23 11:06:21 SO-SLOT-3 osafamfnd[2290]: NO 
'safComp=IMMND,safSu=PL-3,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' 
: Recovery is 'componentRestart'
Mar 23 11:06:21 SO-SLOT-3 osafimmnd[2755]: mkfifo already exists: 
/var/lib/opensaf/osafimmnd.fifo File exists
Mar 23 11:06:21 SO-SLOT-3 osafimmnd[2755]: Started
Mar 23 11:06:26 SO-SLOT-3 osafamfnd[2290]: WA AMF director unexpectedly crashed
Mar 23 11:06:26 SO-SLOT-3 osafamfnd[2290]: Rebooting OpenSAF NodeId = 131855 EE 
Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, 
OwnNodeId = 131855, SupervisionTime = 60

Traces:
>From traces Active 'Failed to find candidate for new IMMND coordinator' and 
>Active IMMD has to restart the IMMSv
~~~
Mar 23 11:06:12.535325 osafimmd [2138:src/imm/immd/immd_evt.c:2638] T5 Received 
IMMND service event
Mar 23 11:06:12.535349 osafimmd [2138:src/imm/immd/immd_evt.c:2741] T5 PROCESS 
MDS EVT: NCSMDS_DOWN, my PID:2138
Mar 23 11:06:12.535451 osafimmd [2138:src/imm/immd/immd_evt.c:2748] T5 
NCSMDS_DOWN => local IMMND down
Mar 23 11:06:12.535463 osafimmd [2138:src/imm/immd/immd_evt.c:2763] T5 IMMND 
DOWN PROCESS detected by IMMD
Mar 23 11:06:12.535475 osafimmd [2138:src/imm/immd/immd_proc.c:0618] >> 
immd_process_immnd_down
Mar 23 11:06:12.535483 osafimmd [2138:src/imm/immd/immd_proc.c:0621] T5 
immd_process_immnd_down pid:2149 on-active:1 cb->immnd_coord:2010f
Mar 23 11:06:12.535503 osafimmd [2138:src/imm/immd/immd_proc.c:0628] WA IMMND 
coordinator at 2010f apparently crashed => electing new coord
Mar 23 11:06:12.535516 osafimmd

[tickets] [opensaf:tickets] #2393 Immd got crashed on Active as immnd restarted on Active with cluster having single controller and payload

2017-03-23 Thread Anders Bjornerstedt

- **Comment**:

Unless this ticket describes system that has been configured to allow a 
headless/SC-absence, then the above is expected behavior and this ticket is 
invalid.

I see no mention of headless/sc-absence mentioned.

The cluster has to reload because the IMMND at a payload can not take on the 
role of 
coordinator IMMND in a normal configuration.



---

** [tickets:#2393] Immd got crashed on Active as immnd restarted on Active with 
cluster having single controller and payload**

**Status:** unassigned
**Milestone:** 5.2.RC2
**Created:** Thu Mar 23, 2017 05:58 AM UTC by Ritu Raj
**Last Updated:** Thu Mar 23, 2017 05:58 AM UTC
**Owner:** nobody
**Attachments:**

- 
[PL-3.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2393/attachment/PL-3.tar.bz2)
 (558.9 kB; application/x-bzip)
- 
[SC-1.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2393/attachment/SC-1.tar.bz2)
 (2.5 MB; application/x-bzip)


###Environment details
OS : Suse 64bit
Changeset : 8701 ( 5.2.RC1)
2 nodes setup(1 controller and 1 payload)

###Summary
Immd got crashed on Active as immnd restarted on Active with cluster having 
single controller and payload

###Steps followed & Observed behaviour
1. Bring up cluster wtih 1 controller and 1 payload 
2. Kill immnd on active controller 
3. Observed, that immd got crashed on Active controller(SC-1) due to which 
Payload also got rebooted

** Issue obserbed when there is only one controller **

**Syslog**
SC-1:::

Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: NO 
'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' component restart probation timer 
started (timeout: 600 ns)
Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: NO Restarting a component of 
'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1)
Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: NO 
'safComp=IMMND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' 
: Recovery is 'componentRestart'
Mar 23 11:06:12 SO-SLOT-1 osafsmfd[2235]: WA DispatchOiCallback: 
saImmOiDispatch() Fail 'SA_AIS_ERR_BAD_HANDLE (9)'
Mar 23 11:06:12 SO-SLOT-1 osafntfimcnd[2181]: NO saImmOiDispatch() Fail 
SA_AIS_ERR_BAD_HANDLE (9)
Mar 23 11:06:12 SO-SLOT-1 osafimmd[2138]: WA IMMND coordinator at 2010f 
apparently crashed => electing new coord
Mar 23 11:06:12 SO-SLOT-1 osafimmd[2138]: ER Failed to find candidate for new 
IMMND coordinator (ScAbsenceAllowed:0 RulingEpoch:2
Mar 23 11:06:12 SO-SLOT-1 osafimmd[2138]: ER Active IMMD has to restart the 
IMMSv. All IMMNDs will restart
Mar 23 11:06:12 SO-SLOT-1 osafimmd[2138]: ER IMM RELOAD with NO persistent back 
end => ensure cluster restart by IMMD exit at both SCs, exiting
Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: NO 
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: ER 
safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: Rebooting OpenSAF NodeId = 131343 EE 
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131343, SupervisionTime = 60
Mar 23 11:06:12 SO-SLOT-1 opensaf_reboot: Rebooting local node; timeout=60

PL-3:::
Mar 23 11:06:21 SO-SLOT-3 osafimmnd[2280]: ER IMMND forced to restart on order 
from IMMD, exiting
Mar 23 11:06:21 SO-SLOT-3 osafamfnd[2290]: NO 
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' component restart probation timer 
started (timeout: 600 ns)
Mar 23 11:06:21 SO-SLOT-3 osafamfnd[2290]: NO Restarting a component of 
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1)
Mar 23 11:06:21 SO-SLOT-3 osafamfnd[2290]: NO 
'safComp=IMMND,safSu=PL-3,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' 
: Recovery is 'componentRestart'
Mar 23 11:06:21 SO-SLOT-3 osafimmnd[2755]: mkfifo already exists: 
/var/lib/opensaf/osafimmnd.fifo File exists
Mar 23 11:06:21 SO-SLOT-3 osafimmnd[2755]: Started
Mar 23 11:06:26 SO-SLOT-3 osafamfnd[2290]: WA AMF director unexpectedly crashed
Mar 23 11:06:26 SO-SLOT-3 osafamfnd[2290]: Rebooting OpenSAF NodeId = 131855 EE 
Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, 
OwnNodeId = 131855, SupervisionTime = 60

Traces:
>From traces Active 'Failed to find candidate for new IMMND coordinator' and 
>Active IMMD has to restart the IMMSv
~~~
Mar 23 11:06:12.535325 osafimmd [2138:src/imm/immd/immd_evt.c:2638] T5 Received 
IMMND service event
Mar 23 11:06:12.535349 osafimmd [2138:src/imm/immd/immd_evt.c:2741] T5 PROCESS 
MDS EVT: NCSMDS_DOWN, my PID:2138
Mar 23 11:06:12.535451 osafimmd [2138:src/imm/immd/immd_evt.c:2748] T5 
NCSMDS_DOWN => local IMMND down
Mar 23 11:06:12.535463 osafimmd [2138:src/imm/immd/immd_evt.c:2763] T5 IMMND 
DOWN PROCESS detected by IMMD
Mar 23 11:06:12.535475 osafimmd [2138:src/imm/immd/immd_proc.c:0618] >> 
immd_process_immnd_down
Mar 23 11:06:12.535483 osafimmd [2138:src/imm/immd/immd_proc.c:0621] T5 
immd_process_immnd_down pid:2149 on-active:1 cb->immnd_coord:2010f
Mar 23

[tickets] [opensaf:tickets] #2382 imm: reducing log level for ccb-committed messages

2017-03-16 Thread Anders Bjornerstedt

First, this ticket should not be a defect.
The log level of the ccb commit messages is intentional, the motive being to 
have a record of if and when a CCB was committed. 

Second, having a record o configuration changes at hte OpensAF level is 
normally necesssary
for analyzing a reproted problem involving OpenSAF. Many problems are triggered 
by a 
configuration change. Having a persistent record of such configuration changes 
is crucial for
understanding or debugging unexpected events or problems, in a system.
Such troubleshooting does not just cover troubleshooting of OpenSAF, but also 
troubleshooting
of application level behavior when the configuration of such an application is 
changed.

Log level NOtice is the lowest log level that is pushed to the syslog by 
default in OpenSAF.
This ticket in fact goes further than just lowering the log level to INfo 
(which is normally
not logged but can be toggled on), it argues for lowering it to trace!

So you could end up in a scenario where there is a serious incident on a 
system, but no way
to see from OpensAF logs if there was any configuration change involved in 
triggering the 
problem. You would need to reproduce the problem to get trace or INfo log level 
enabled.
The problem with trace is that the volumes are so large that it somtimes 
impacts the bhavior
of the system, simetimes making it difficult to reproduce the problem.

CCB traffic is very low during normal operation. Only during SMF campaigns, 
or manual reconfigurations of the system would there be CCB traffic of any 
significance.
So log messages of committed CCBs can hardly be a big issue in teerms of 
volume, in general.

In summary:
I argue that this ticket is not motivated and it is by definition not a defect 
since the current behavior
is intentional and well motivated.

The motive behind this ticket should be analyzed better and explained better in 
the ticket.
Or the ticket may just be closed.

A slightly better alternative is to introduce a new configuration parameter  to 
specify if CCB commits
are to be logged. The default of that configuration parameter must of course be 
OFF (currrent
behavior the default).






---

** [tickets:#2382] imm: reducing log level for ccb-committed messages**

**Status:** review
**Milestone:** 5.0.2
**Created:** Thu Mar 16, 2017 09:26 AM UTC by Neelakanta Reddy
**Last Updated:** Thu Mar 16, 2017 09:47 AM UTC
**Owner:** Neelakanta Reddy


 if(i != sOwnerVector.end()) {
LOG_NO("Ccb %u COMMITTED (%s)", ccb->mId, 
(*i)->mAdminOwnerName.c_str());
} else {
LOG_NO("Ccb %u COMMITTED (%s)", ccb->mId, "");
}

Reduce the LOG_NO to TRACE


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2284 IMM: Improper return code without any error string while deleting large number of objects

2017-03-10 Thread Anders Bjornerstedt

The slogan of this ticket and the analysis done in this ticket was missleading 
since the observed
error really had nothing specifically to do to with deltion of objects, but 
rather with the setting of
adminOwner over many objects.

Most likely that confusion stems from observations done using the immcfg tool, 
which maps a tool level delete to more than one IMM API call. 

The slogan of the ticket could have been changed, but there is no way to delete 
an incorrect analysis.



---

** [tickets:#2284] IMM: Improper return code without any error string while 
deleting large number of objects**

**Status:** invalid
**Milestone:** 5.2.RC1
**Created:** Wed Feb 01, 2017 07:13 AM UTC by Chani Srivastava
**Last Updated:** Fri Mar 10, 2017 06:17 AM UTC
**Owner:** nobody


Steps to reproduce:

1. Bring up opensaf on a cluster
2. Create around 10k objects
3. Try deleating these objects in one immcfg operation

Output:
Error Returned - error - saImmOmAdminOwnerSet FAILED: SA_AIS_ERR_LIBRARY (2)

No error string stating the cause of failure is returned.

Syslog - immcfg: ER TOO MANY Object Names line:733

Expected behavior - Proper return code with error string should be returned 


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2284 IMM: Improper return code without any error string while deleting large number of objects

2017-03-03 Thread Anders Bjornerstedt

To summarize, the 10k limit is not per CCB but per adminOwnerSet call.

The limit has nothing to do with avoiding "corruption in IMM".
It simply has to  do with the size of some messages being sent over the system.
The ticket needs to be re-writen/re-defined. Best probably to close this one 
and maybe re-open
a new ticket. 

I agree that ERR_LIBRARY is not the correct return code for this case. 
ERR_NO_RESOURCES would
be better. Probably the documentation needs an update explaining the limit on 
the number of
objects covered (explicitly, or implicitly by subtree recursion) for an 
admin-owner-set.

An enhancement could in theory be defined to implement support for setting 
admin owner over
larger number of objectst using one imm API call. But that use case is very 
rare outside of testing
and a work arround should exist for the application (or immtools internally) to 
generate more than
one admin-owner set call still within the same CCB.


---

** [tickets:#2284] IMM: Improper return code without any error string while 
deleting large number of objects**

**Status:** unassigned
**Milestone:** 5.2.RC1
**Created:** Wed Feb 01, 2017 07:13 AM UTC by Chani Srivastava
**Last Updated:** Fri Mar 03, 2017 01:18 PM UTC
**Owner:** nobody


Steps to reproduce:

1. Bring up opensaf on a cluster
2. Create around 10k objects
3. Try deleating these objects in one immcfg operation

Output:
Error Returned - error - saImmOmAdminOwnerSet FAILED: SA_AIS_ERR_LIBRARY (2)

No error string stating the cause of failure is returned.

Syslog - immcfg: ER TOO MANY Object Names line:733

Expected behavior - Proper return code with error string should be returned 


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2284 IMM: Improper return code without any error string while deleting large number of objects

2017-03-03 Thread Anders Bjornerstedt

Note  also that it is adminOwnerSet that fails with ERR_LIBRARY

saImmOmAdminOwnerSet FAILED: SA_AIS_ERR_LIBRARY (2
  
and not saImmOmCcbObjectDelete.


---

** [tickets:#2284] IMM: Improper return code without any error string while 
deleting large number of objects**

**Status:** unassigned
**Milestone:** 5.2.RC1
**Created:** Wed Feb 01, 2017 07:13 AM UTC by Chani Srivastava
**Last Updated:** Fri Mar 03, 2017 01:11 PM UTC
**Owner:** nobody


Steps to reproduce:

1. Bring up opensaf on a cluster
2. Create around 10k objects
3. Try deleating these objects in one immcfg operation

Output:
Error Returned - error - saImmOmAdminOwnerSet FAILED: SA_AIS_ERR_LIBRARY (2)

No error string stating the cause of failure is returned.

Syslog - immcfg: ER TOO MANY Object Names line:733

Expected behavior - Proper return code with error string should be returned 


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2284 IMM: Improper return code without any error string while deleting large number of objects

2017-03-03 Thread Anders Bjornerstedt

I am not aware of any size limitation for CCBs in IMM as such. Even if there 
was, exeeding it would result in some kind of explicit resource/timeout error 
for that case and absolutely not 
"database corruption". 

There IS a size limitation for the database total number of objects. If I 
remember correctly 300K objects of average size 300 bytes (?) its in the 
IMM_README.

There may be (is probably) a limit on CCB size for the immcfg tool ?

Most likely the problem observed here *is* due to some kind of library issue. 
Someone should check the imm library code for adminOwnerSet the cases that can 
return ERR_LIBRARY.





---

** [tickets:#2284] IMM: Improper return code without any error string while 
deleting large number of objects**

**Status:** unassigned
**Milestone:** 5.2.RC1
**Created:** Wed Feb 01, 2017 07:13 AM UTC by Chani Srivastava
**Last Updated:** Wed Feb 01, 2017 09:02 AM UTC
**Owner:** nobody


Steps to reproduce:

1. Bring up opensaf on a cluster
2. Create around 10k objects
3. Try deleating these objects in one immcfg operation

Output:
Error Returned - error - saImmOmAdminOwnerSet FAILED: SA_AIS_ERR_LIBRARY (2)

No error string stating the cause of failure is returned.

Syslog - immcfg: ER TOO MANY Object Names line:733

Expected behavior - Proper return code with error string should be returned 


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2323 imm: CCB operations fail after SC absence (Headless)

2017-03-01 Thread Anders Bjornerstedt

- **summary**: imm: CCB operations fail after SC absence --> imm: CCB 
operations fail after SC absence (Headless)
- **Comment**:

Added "headless" clarification because "AC absence" can be missunderstood as 
just one (out of normally two) SCs being absent.



---

** [tickets:#2323] imm: CCB operations fail after SC absence (Headless)**

**Status:** review
**Milestone:** 5.0.2
**Created:** Thu Feb 23, 2017 03:36 PM UTC by Hung Nguyen
**Last Updated:** Wed Mar 01, 2017 07:04 AM UTC
**Owner:** Hung Nguyen
**Attachments:**

- 
[logs_n_traces.tgz](https://sourceforge.net/p/opensaf/tickets/2323/attachment/logs_n_traces.tgz)
 (658.6 kB; application/gzip)


Reproduce steps:
~~~
1. Start SC-1
2. Commit some CCBs
# immcfg -c Test test=0
# immcfg -c Test test=1
# immcfg -c Test test=2
# immcfg -c Test test=3
3. Start PL-3
4. Restart SC-1
5. When SC-1 is back, it fails to add operations to CCB
# immcfg -c Test test=10
error - saImmOmCcbObjectCreate_2 FAILED with SA_AIS_ERR_FAILED_OPERATION 
(21)
OI reports: IMM: Resource abort: CCB is not in an expected state
error - saImmOmCcbApply FAILED: SA_AIS_ERR_FAILED_OPERATION (21)
OI reports: IMM: Resource abort: CCB is not in an expected state
~~~

**cb->mLatestCcbId** was not updated on PL-3 when it joined the cluster so it 
still had value of zero.

When SC-1 was back from headless, IMMND on PL-3 sent re-introduce message to 
IMMD on SC-1 with **cb->mLatestCcbId = 0**.

IMMD failed to update **cb->ccb_id_count** so when new CCB is created, it will 
start from **0+1** instead of **mLatestCcbId + 1**.

That results in the conflict with the CCB in **sCcbVector** and the CCB 
operation failure.

Attached is logs and traces.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #2229 imm:disable pbe should honor critical ccbs

2016-12-14 Thread Anders Bjornerstedt

- **Comment**:

I have a several problems wiith this ticket.

First, the problem description is both incorrect and incomplete.
Point 7 is incorrect because the alternative and simplest way to clear the 
issue is to
re-enable the PBE. It even says so in the pasted warning message in the ticket.

Second, the description is incomplete because it does not describe the 
application
(point 3) in any detail. It justs says run "multiple" ccb operations.
The application types supported by IMM for CCBs are (a) operator initiated 
configuration changes
and (b) operator initiatated management procedures (c) upgrade campaigns.

Now both (a) and (b) are defined as being limited in size and time. 
Put in another way, if the configuration change is masssive, then it probably 
should go
into a campaign. Or if the "configuration change" is some kind of high 
troughput continous
... h test (?), then that is not a valid test in itsef.
The SAF imm serice is not designed to support high throughput applications.
If you nevertheless insist on using some kind of automated continous CCb 
generting
application (which by definition is not using te´he imm for storing just config 
data) then
at the very least any upgrade campaign needs to  be made awqare of the 
nonconformant
application so that the campaign can quiesce the deviant application before 
starting
the upgrade propper. But of course the improper application should not be there 
in he first place.

A proposed "fix" has been sent for review. But as I understand that fix, it 
does not fix the
problem. It only reduces the likelyhood of it persisting. Its a timer based 
solution.
So the fix is an "enhancement" type fix for a problem that lies outside the 
scope of what the
SAF IMM service is intended to support. It then possibly an "enhancement".

But I would still argue that it is a "bad" enhancement since it dous not truly 
remove the
problem and the invites missunderstanding/missuse of the imm service.




---

** [tickets:#2229] imm:disable pbe should honor critical ccbs**

**Status:** review
**Milestone:** 5.2.FC
**Created:** Wed Dec 14, 2016 09:29 AM UTC by Neelakanta Reddy
**Last Updated:** Wed Dec 14, 2016 09:47 AM UTC
**Owner:** Neelakanta Reddy


reproducible steps:
1. Bring up the cluster with PBE configured.
2. enable PBE
3. parallely run multiple ccb operations
4. disable PBE
5. in one of the payload/controller restart the immnd/node
6. sync wil be aboreted with following messages 
 WA PBE has been disabled with ccbs in critical state - To resolve: Enable PBE 
or resart/reload the cluster
  NO Still waiting for existing Ccbs to terminate after 20.027520 seconds. 
Aborting this sync attempt
7. The IMMND will never get synced untill cluster restart

The problem is observed, when the node is not joining in middleware upgrade, 
and evetually upgrade fails.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1747 IMMND trying to start PBE process while stopping OpenSAF services

2016-04-20 Thread Anders Bjornerstedt

- **Comment**:

Instead of seing this as a minor defect, it could seen as being part of 
enhancement #56.
There are a lot of "missleading" log messages when shutting down OpenSAF. The 
main reason is that OpenSAFs intended normal use is to never shut down (except 
during testing).



---

** [tickets:#1747] IMMND trying to start PBE process while stopping OpenSAF 
services**

**Status:** unassigned
**Milestone:** 5.0.RC2
**Created:** Mon Apr 11, 2016 10:30 AM UTC by Chani Srivastava
**Last Updated:** Wed Apr 13, 2016 06:55 AM UTC
**Owner:** nobody


Setup:
Changeset- 7436
Version - opensaf 5.0
1-PBE enabled
Issue is not observed always.

Apr 11 13:32:52 OSAF-SC1 opensafd: Stopping OpenSAF Services
Apr 11 13:32:52 OSAF-SC1 osafamfnd[29960]: NO Shutdown initiated
Apr 11 13:32:52 OSAF-SC1 osafamfnd[29960]: NO Terminating all AMF components
Apr 11 13:32:52 OSAF-SC1 osafimmpbed: NO IMM PBE received SIG_TERM, closing db 
handle
Apr 11 13:32:52 OSAF-SC1 osafimmpbed: IN IMM PBE process EXITING...
Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: NO Implementer locally disconnected. 
Marking it as doomed 18 <545, 2010f> (OpenSafImmPBE)
Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: NO Implementer disconnected 18 <545, 
2010f> (OpenSafImmPBE)
Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: WA Persistent back-end process has 
apparently died.
Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: NO STARTING PBE process.
Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: NO 
pbe-db-file-path:/home/chani/immPBE/imm.db VETERAN:1 B:0
Apr 11 13:32:53 OSAF-SC1 osafckptnd[30049]: exiting for shutdown
Apr 11 13:32:53 OSAF-SC1 osafsmfd[29976]: exiting for shutdown
Apr 11 13:32:53 OSAF-SC1 osaflckd[30057]: exiting for shutdown
Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: NO Implementer locally disconnected. 
Marking it as doomed 2412 <321, 2010f> (safLckService)
Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: NO Implementer disconnected 2412 
<321, 2010f> (safLckService)
Apr 11 13:32:53 OSAF-SC1 osaflcknd[30032]: exiting for shutdown
Apr 11 13:32:53 OSAF-SC1 osafclmna[29860]: exiting for shutdown
Apr 11 13:32:53 OSAF-SC1 osafimmd[29888]: exiting for shutdown
Apr 11 13:32:53 OSAF-SC1 osaffmd[29878]: exiting for shutdown
Apr 11 13:32:53 OSAF-SC1 osafrded[29869]: exiting for shutdown
Apr 11 13:32:53 OSAF-SC1 osafevtd[30088]: exiting for shutdown
Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: NO Implementer locally disconnected. 
Marking it as doomed 2413 <315, 2010f> (safEvtService)
Apr 11 13:32:53 OSAF-SC1 osafckptd[30097]: exiting for shutdown
Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: NO Implementer locally disconnected. 
Marking it as doomed 2411 <330, 2010f> (safCheckPointService)
Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: exiting for shutdown
Apr 11 13:32:53 OSAF-SC1 osafmsgd[30011]: exiting for shutdown
Apr 11 13:32:53 OSAF-SC1 osafmsgnd[29995]: exiting for shutdown
Apr 11 13:32:53 OSAF-SC1 osafsmfnd[29978]: exiting for shutdown
Apr 11 13:32:53 OSAF-SC1 osaflogd[29914]: exiting for shutdown
Apr 11 13:32:53 OSAF-SC1 osafntfimcnd[5780]: NO saImmOiDispatch() Fail 
SA_AIS_ERR_BAD_HANDLE (9)
Apr 11 13:32:53 OSAF-SC1 osafclmd[29940]: exiting for shutdown
Apr 11 13:32:53 OSAF-SC1 osafimmpbed: IN arg[0] == 
'/usr/lib64/opensaf/osafimmpbed'
Apr 11 13:32:53 OSAF-SC1 osafimmpbed: IN arg[1] == '--recover'
Apr 11 13:32:53 OSAF-SC1 osafimmpbed: IN arg[2] == '--pbe'
Apr 11 13:32:53 OSAF-SC1 osafimmpbed: IN arg[3] == '/home/chani/immPBE/imm.db'
Apr 11 13:32:53 OSAF-SC1 osafimmpbed: ER osafimmpbe is not started by osafimmnd



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #48 IMM: Support for transactionally safe reads

2015-11-05 Thread Anders Bjornerstedt

- **status**: accepted --> assigned
- **assigned_to**: Anders Bjornerstedt --> Zoran Milinkovic



---

** [tickets:#48] IMM: Support for transactionally safe reads**

**Status:** assigned
**Milestone:** 5.0.FC
**Created:** Wed May 08, 2013 07:48 AM UTC by Anders Bjornerstedt
**Last Updated:** Sun Nov 01, 2015 09:36 PM UTC
**Owner:** Zoran Milinkovic


Migrated from:
http://devel.opensaf.org/ticket/3111

The Ccb concept as defined by the IMM SAF standard does not
include any support for safe reads. That is, object reads that
are protected and part of a ccb/transaction.

The closest thing it has to safe reads is the admin-owner concept.
By setting admin-owner for not just the objects to be changed,
but also for objects included in the read-set of the ccb, the risk
is reduced but not eliminated for the CCB being committed with an
inconsistent read-set. The reason the risk is not eliminated is
that concurrent CCBs are allowed under the same admin-owner.
Another reason is that it is all too easy for applications
to perform non-repeatable-reads using accessor-get or iterations,
without remembering to set admin-owner over the read objects.
I suspect that this is the rule rather than the exception.

The 'read-set' for a ccb/transaction is the set of objects that
the ccb/transaction needs to read (and have unchanged or only
changed by the same ccb/transaction) untill the ccb/transaction
terminates. The cardinal example would be an OI doing validation
in hte completed callback. In general the OI needs to validate the
changes not only within the limited context of the changed objects,
but also relative to other objects that may not be changed by that
specific transction. Currently all OIs would need to maintain inernal
copies of all config data that they manage to acheive that. With
safe-read this is no longer necessary. Some interrelated datamodels
may also be managed by several OIs. The safe-read mechanism
uses shaed locking allowing several OIs to safe-read access the same
objects from different CCBs. 

This enhancement proposes to add an additional ccb related
function for reading an object and associating that read with
what is (or is equivalent to) a shared readlock.

The OpenSAF IMM implementation already implements exclusive
write locks for create/delete/modify operations in a ccb.
Thus a Ccb that succeeds in invoking such a mutating operation
will reserve exclusive write access to that object until the Ccb
is terminated by commit or abort. The exclusivity is only in
relation to other CCB operations (including safe reads).
The accessor and iteration APIs still allow other processes to
perform non repeatable reads, i.e. non transactional reads, i.e.
unsafe reads, concurrently with an open CCB that is mutating such
objects. Such unsafe reads are allowed without considering changes
pending in on going CCBs or what admin-owner is set for the object.

The new API that is proposed looks like this:

saImmOmCcbObjectRead(SaImmCcbHandleT ccbHandle,

SaConstStringT objectName,
const SaImmAttrNameT *attributeNames,
SaImmAttrValuesT_2 ***attributes);

It has a signature very similar to saImmOmAccessorGet_2,
with the difference only in taking a ccbHandle instead of an accessorHandle.
The semantics of the API is identical to accessorGet with the exceptions that
any returned config attributes are from the *latest* version of the object that 
is
locked by this ccb. 

This operation will succeed unless the object is write locked by another Ccb.
It will succeed if the object is not locked by other Ccbs or if it is only 
read-locked
(shared) by other Ccbs. Another Ccb trying to write lock this object when this
ccb has a shared read-lock will fail and have to wait at least until
after this ccb is terminated. Another Ccb trying to read lock this object when 
this
ccb has a shared read-lock will succeed and obtain a read-lock

A safe read on an object that is already write locked by the same Ccb for 
create or
modify will succeed, but not change the lock-type and will provide the current 
latest
version of the object in the context of the CCB. A safe read on an object that 
is already
write locked by the same Ccb for delete will fail with ERR_NOT_EXIST.

Thus any modifications done to the object by this ccb but not yet committed, 
will be
reflected in the result returned by the safe read call.

All of the above should be recognized as pretty much standard
transactional behavior for the OM API. What then about implementers
and the OI API? After all, one typically important type of
participant in a Ccb are the OIs performing validation of the CCb.
Validation should normally include reading both data modified by
the Ccb and reading data not modified by the ccb, but that still
needs to be part of the read-set for the transaction, to commit
without, violating integrity constraints.

The proposal is for the OI to obtain a ccb-handle using the
existing saImmOiAugmentCcbInitialize API. Then to use the

[tickets] [opensaf:tickets] #1554 imm: validation abort should have precedence over resource abort

2015-10-21 Thread Anders Bjornerstedt

- Description has changed:

Diff:



--- old
+++ new
@@ -1,4 +1,4 @@
-When CCB is applied, the CCB may receive multiple error strings from more OIs.
-Ticket #744 implemented validation/resource abort error strings in the way 
that the precedence has the first receiving error string. Other 
validation/resource abort will be ignored.
+When CCB is applied, the CCB may receive multiple error strings from several 
OIs.
+Ticket #744 implemented validation/resource abort error strings in the way 
that precedence was given the first received error string. Subsequent strings 
where ignored.
 
 Validation abort reason is more significant than resource abort, and it must 
override resource abort error string.






---

** [tickets:#1554] imm: validation abort should have precedence over resource 
abort**

**Status:** accepted
**Milestone:** 4.7.RC1
**Created:** Wed Oct 21, 2015 10:38 AM UTC by Zoran Milinkovic
**Last Updated:** Wed Oct 21, 2015 10:42 AM UTC
**Owner:** Zoran Milinkovic


When CCB is applied, the CCB may receive multiple error strings from several 
OIs.
Ticket #744 implemented validation/resource abort error strings in the way that 
precedence was given the first received error string. Subsequent strings where 
ignored.

Validation abort reason is more significant than resource abort, and it must 
override resource abort error string.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1514 Opensaf on payload failed to come up and IMMD on active controller faulted

2015-10-15 Thread Anders Bjornerstedt

I see this as a duplicate of #1291, which is closed as invalid.
The basic problem is communication overload.

The only available current solution for deployments that see this issue is to 
reduce the 
value for the opensafImmSyncBatchSize config attribute in the OpensAF IMM 
service object:
opensafImm=opensafImm,safApp=safImmService

Beyond this, there are various enhancements, in MDS or OpenSAF that could 
potentially reduce
the risk of communication overload. 





---

** [tickets:#1514]  Opensaf on payload failed to come up and IMMD on active 
controller faulted**

**Status:** assigned
**Milestone:** 4.7.RC1
**Created:** Mon Oct 05, 2015 10:03 AM UTC by Ritu Raj
**Last Updated:** Wed Oct 07, 2015 12:37 PM UTC
**Owner:** A V Mahesh (AVM)
**Attachments:**

- 
[1513.tgz](https://sourceforge.net/p/opensaf/tickets/1514/attachment/1513.tgz) 
(7.1 MB; application/x-compressed-tar)


Setup:
Changeset- 6901
4 nodes configured with single PBE and a load of 30K objects

Issue observed
* Payload failed to join the cluster and  later active controller rebooted 

Steps performed:
* Started OpenSAF on the  controller SC-1 and SC-1 took the active role .

Oct  5 12:33:31 SLES-64BIT-SLOT1 osafrded[3129]: NO No peer available => 
Setting Active role for this node
 Later, started opensaf on slot-2, for which opensafd failed because of the 
disk size full. Resolved the issue and restarted the opensaf on slot-2, which 
ensured that both the nodes joined the cluster.

Oct  5 12:45:34 SLES-32BIT-SLOT2 osafrded[15186]: NO Peer rde@2010f has active 
state => Assigning Standby role to this node


* After controllers formed the cluster, later started opensaf on the remaining 
two payloads  at same time.
*  PL-3 joined the cluster successfully.
*  
Oct  5 13:03:19 SLES-64BIT-SLOT3 kernel: [495958.582544] TIPC: Own node address 
<1.1.3>, network identity 5234

Oct  5 13:09:34 SLES-64BIT-SLOT3 osafimmnd[15392]: NO NODE STATE-> 
IMM_NODE_FULLY_AVAILABLE 17601
Oct  5 13:09:34 SLES-64BIT-SLOT3 osafimmnd[15392]: NO Epoch set to 125 in 
ImmModel
Oct  5 13:09:35 SLES-64BIT-SLOT3 osafimmnd[15392]: NO Implementer (applier) 
connected: 27 (@OpenSafImmReplicatorB) <0, 2010f>


* PL-4  failed to join the cluster,

Oct  5 13:03:38 SLES-32BIT-SLOT4 kernel: [436326.659526] TIPC: Own node address 
<1.1.4>, network identity 5234
Oct  5 13:03:38 SLES-32BIT-SLOT4 osafimmnd[8781]: NO Persistent Back-End 
capability configured, Pbe file:imm.db (suffix may get added)
Oct  5 13:03:38 SLES-32BIT-SLOT4 osafimmnd[8781]: NO SERVER STATE: 
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
Oct  5 13:03:43 SLES-32BIT-SLOT4 osafimmnd[8781]: WA Resending introduce-me - 
problems with MDS ? 5
Oct  5 13:03:43 SLES-32BIT-SLOT4 osafimmnd[8781]: WA Resending introduce-me - 
problems with MDS ? 5
...
Oct  5 13:04:28 SLES-32BIT-SLOT4 osafimmnd[8781]: WA Resending introduce-me - 
problems with MDS ? 50
Oct  5 13:04:29 SLES-32BIT-SLOT4 osafimmnd[8781]: ER Failed to load/sync. 
Giving up after 51 seconds, restarting..
Oct  5 13:04:29 SLES-32BIT-SLOT4 opensafd[8736]: ER Failed   DESC:IMMND
Oct  5 13:04:29 SLES-32BIT-SLOT4 opensafd[8736]: ER Going for recovery
...Oct  5 13:06:41 SLES-32BIT-SLOT4 osafimmnd[8856]: ER Failed to load/sync. 
Giving up after 51 seconds, restarting..
Oct  5 13:06:41 SLES-32BIT-SLOT4 opensafd[8736]: ER Could Not RESPAWN IMMND
Oct  5 13:06:41 SLES-32BIT-SLOT4 opensafd[8736]: ER Failed   DESC:IMMND
Oct  5 13:06:41 SLES-32BIT-SLOT4 opensafd[8736]: ER FAILED TO RESPAWN
Oct  5 13:06:41 SLES-32BIT-SLOT4 osafimmnd[8856]: ER IMMND - Periodic server 
job failed
Oct  5 13:06:41 SLES-32BIT-SLOT4 osafimmnd[8856]: ER Failed, exiting...
Oct  5 13:06:41 SLES-32BIT-SLOT4 kernel: [436509.187946] TIPC: Disabling bearer 


 * After the opensafd failed to come up on PL-4, SC-1 rebooted with IMMD 
exiting.

Oct  5 13:08:52 SLES-64BIT-SLOT1 osafimmnd[3163]: NO Coord broadcasting 
PBE_PRTO_PURGE_MUTATIONS, epoch:123
Oct  5 13:08:53 SLES-64BIT-SLOT1 osafimmnd[3163]: NO ImmModel::getPbeOi reports 
missing PbeOi locally => unsafe
Oct  5 13:08:53 SLES-64BIT-SLOT1 osafimmnd[3163]: NO Coord broadcasting 
PBE_PRTO_PURGE_MUTATIONS, epoch:123
Oct  5 13:08:53 SLES-64BIT-SLOT1 osafamfnd[3239]: NO SU failover probation 
timer started (timeout: 12000 ns)
Oct  5 13:08:53 SLES-64BIT-SLOT1 osafamfnd[3239]: NO Performing failover of 
'safSu=SC-1,safSg=2N,safApp=OpenSAF' (SU failover count: 1)
Oct  5 13:08:53 SLES-64BIT-SLOT1 osafamfnd[3239]: NO 
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' recovery action escalated 
from 'componentFailover' to 'suFailover'
Oct  5 13:08:53 SLES-64BIT-SLOT1 osafamfnd[3239]: NO 
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 
'healthCheckcallbackTimeout' : Recovery is 'suFailover'
Oct  5 13:08:53 SLES-64BIT-SLOT1 osafamfnd[3239]: ER 
safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due 
to:healthCheckcallbackTimeout Recovery is:suFailover


* PL-4 joined the cluster, after opensafd is started on PL-4 after some

[tickets] [opensaf:tickets] #1526 imm: 1PBE can see db as locked

2015-10-13 Thread Anders Bjornerstedt

- **status**: review --> accepted
- **Comment**:

I nack'ed the patch because the imm service already has a restart mechanism for 
the PBE if
it gets stuck and the symptom shown here must result from a bug (if this truly 
is on 1PBE).

If there is not enough information to locate the bug, then the problem needs to 
be reproduced
with trace.

If it can not be reproduced then we close the ticket as not reproducible.



---

** [tickets:#1526] imm: 1PBE can see db as locked**

**Status:** accepted
**Milestone:** 4.5.2
**Created:** Wed Oct 07, 2015 09:43 AM UTC by Neelakanta Reddy
**Last Updated:** Thu Oct 08, 2015 07:58 AM UTC
**Owner:** Neelakanta Reddy


when the disk is full the sqlite will return error.

Sep 18 13:42:02 SC-2 osafimmpbed: ER SQL statement ('COMMIT TRANSACTION') 
failed because:  disk I/O error
Sep 18 13:42:02 SC-2 osafimmnd[13067]: NO Invalid error reported implementer 
'OpenSafImmPBE', Ccb 321 will be aborted
Sep 18 13:42:02 SC-2 osafimmnd[13067]: NO Ccb 321 ABORTED (TraceC)
Sep 18 13:42:02 SC-2 osafimmpbed: WA Failed to find CCB object for 141/321


Due to continoues CCB operations (even though disk is full) the 1PBE is seeing 
the following mesages for more than 3 hours:

messages:Sep 18 17:58:46 SC-2 osafimmpbed: WA Sqlite db locked by other thread.
messages:Sep 18 17:58:46 SC-2 osafimmpbed: WA Sqlite db locked by other thread.
messages:Sep 18 17:58:47 SC-2 osafimmpbed: WA Sqlite db locked by other thread.
messages:Sep 18 17:58:47 SC-2 osafimmpbed: WA Sqlite db locked by other thread.



messages.7:Sep 18 14:22:22 SC-2 osafimmpbed: WA Sqlite db locked by other 
thread.
messages.7:Sep 18 14:22:23 SC-2 osafimmpbed: WA Sqlite db locked by other 
thread.
messages.7:Sep 18 14:22:23 SC-2 osafimmpbed: WA Sqlite db locked by other 
thread.
messages.7:Sep 18 14:22:24 SC-2 osafimmpbed: WA Sqlite db locked by other thread

After freeing the space still the PBE is got struck in Sqlite db locked by 
other thread.
This is preventing any further operations. 
once the PBE is killed, the imm.db re-generated and the CCB operations are 
applied.

Solution(1PBE):

For the 1PBE case, which is not multi threaded, if the sqlite db locked case is 
reached abort the PBE and let the PBE be re-generated(instead of blocking the 
PBE process).

 



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1526 imm: exit the 1PBE when pbeBeginTrans sees db as locked

2015-10-08 Thread Anders Bjornerstedt

Question: How can this case happen for the 1PBE case when there is only one 
user thread using the sqlite instance ? 

Another relevant question is why/when do you observe this now ?
The test case or test setup must be special somehow.

With only one thread this case should be impossible. 
It suggest heap correuption could be the cause.

Some years ago we did see problems although not exactly this kind, in 
conjunction with
repeated failovers, where the new PBE managed to start while the old PBE (on 
the other SC) was
still executing (slow to terminate). But the distributes file level protection 
uses file system locking
and the symptoms should be different.


---

** [tickets:#1526] imm: exit the 1PBE  when pbeBeginTrans sees db as locked**

**Status:** review
**Milestone:** 4.5.2
**Created:** Wed Oct 07, 2015 09:43 AM UTC by Neelakanta Reddy
**Last Updated:** Thu Oct 08, 2015 07:21 AM UTC
**Owner:** Neelakanta Reddy


when the disk is full the sqlite will return error.

Sep 18 13:42:02 SC-2 osafimmpbed: ER SQL statement ('COMMIT TRANSACTION') 
failed because:  disk I/O error
Sep 18 13:42:02 SC-2 osafimmnd[13067]: NO Invalid error reported implementer 
'OpenSafImmPBE', Ccb 321 will be aborted
Sep 18 13:42:02 SC-2 osafimmnd[13067]: NO Ccb 321 ABORTED (TraceC)
Sep 18 13:42:02 SC-2 osafimmpbed: WA Failed to find CCB object for 141/321


Due to continoues CCB operations (even though disk is full) the 1PBE is seeing 
the following mesages for more than 3 hours:

messages:Sep 18 17:58:46 SC-2 osafimmpbed: WA Sqlite db locked by other thread.
messages:Sep 18 17:58:46 SC-2 osafimmpbed: WA Sqlite db locked by other thread.
messages:Sep 18 17:58:47 SC-2 osafimmpbed: WA Sqlite db locked by other thread.
messages:Sep 18 17:58:47 SC-2 osafimmpbed: WA Sqlite db locked by other thread.



messages.7:Sep 18 14:22:22 SC-2 osafimmpbed: WA Sqlite db locked by other 
thread.
messages.7:Sep 18 14:22:23 SC-2 osafimmpbed: WA Sqlite db locked by other 
thread.
messages.7:Sep 18 14:22:23 SC-2 osafimmpbed: WA Sqlite db locked by other 
thread.
messages.7:Sep 18 14:22:24 SC-2 osafimmpbed: WA Sqlite db locked by other thread

After freeing the space still the PBE is got struck in Sqlite db locked by 
other thread.
This is preventing any further operations. 
once the PBE is killed, the imm.db re-generated and the CCB operations are 
applied.

Solution(1PBE):

For the 1PBE case, which is not multi threaded, if the sqlite db locked case is 
reached abort the PBE and let the PBE be re-generated(instead of blocking the 
PBE process).

 



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1526 imm: exit the 1PBE when pbeBeginTrans sees db as locked

2015-10-08 Thread Anders Bjornerstedt

I looked at the code and the error message is correct but the "lock" is the PBE 
"spin lock" created
for handling 2PBE. The fact that it finds it locked in 1PBE means there is a 
logical bug somewhere
in 1PBE. 

Most likely some error case where there is a bailout from commit processing 
without correct cleanup.


---

** [tickets:#1526] imm: exit the 1PBE  when pbeBeginTrans sees db as locked**

**Status:** review
**Milestone:** 4.5.2
**Created:** Wed Oct 07, 2015 09:43 AM UTC by Neelakanta Reddy
**Last Updated:** Thu Oct 08, 2015 07:43 AM UTC
**Owner:** Neelakanta Reddy


when the disk is full the sqlite will return error.

Sep 18 13:42:02 SC-2 osafimmpbed: ER SQL statement ('COMMIT TRANSACTION') 
failed because:  disk I/O error
Sep 18 13:42:02 SC-2 osafimmnd[13067]: NO Invalid error reported implementer 
'OpenSafImmPBE', Ccb 321 will be aborted
Sep 18 13:42:02 SC-2 osafimmnd[13067]: NO Ccb 321 ABORTED (TraceC)
Sep 18 13:42:02 SC-2 osafimmpbed: WA Failed to find CCB object for 141/321


Due to continoues CCB operations (even though disk is full) the 1PBE is seeing 
the following mesages for more than 3 hours:

messages:Sep 18 17:58:46 SC-2 osafimmpbed: WA Sqlite db locked by other thread.
messages:Sep 18 17:58:46 SC-2 osafimmpbed: WA Sqlite db locked by other thread.
messages:Sep 18 17:58:47 SC-2 osafimmpbed: WA Sqlite db locked by other thread.
messages:Sep 18 17:58:47 SC-2 osafimmpbed: WA Sqlite db locked by other thread.



messages.7:Sep 18 14:22:22 SC-2 osafimmpbed: WA Sqlite db locked by other 
thread.
messages.7:Sep 18 14:22:23 SC-2 osafimmpbed: WA Sqlite db locked by other 
thread.
messages.7:Sep 18 14:22:23 SC-2 osafimmpbed: WA Sqlite db locked by other 
thread.
messages.7:Sep 18 14:22:24 SC-2 osafimmpbed: WA Sqlite db locked by other thread

After freeing the space still the PBE is got struck in Sqlite db locked by 
other thread.
This is preventing any further operations. 
once the PBE is killed, the imm.db re-generated and the CCB operations are 
applied.

Solution(1PBE):

For the 1PBE case, which is not multi threaded, if the sqlite db locked case is 
reached abort the PBE and let the PBE be re-generated(instead of blocking the 
PBE process).

 



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1526 imm: exit the 1PBE when pbeBeginTrans sees db as locked

2015-10-08 Thread Anders Bjornerstedt

I guess it could be that the pbe level message "Sqlite db locked by other 
thread" is plain wrong,
i.e. missleading.



---

** [tickets:#1526] imm: exit the 1PBE  when pbeBeginTrans sees db as locked**

**Status:** review
**Milestone:** 4.5.2
**Created:** Wed Oct 07, 2015 09:43 AM UTC by Neelakanta Reddy
**Last Updated:** Thu Oct 08, 2015 07:35 AM UTC
**Owner:** Neelakanta Reddy


when the disk is full the sqlite will return error.

Sep 18 13:42:02 SC-2 osafimmpbed: ER SQL statement ('COMMIT TRANSACTION') 
failed because:  disk I/O error
Sep 18 13:42:02 SC-2 osafimmnd[13067]: NO Invalid error reported implementer 
'OpenSafImmPBE', Ccb 321 will be aborted
Sep 18 13:42:02 SC-2 osafimmnd[13067]: NO Ccb 321 ABORTED (TraceC)
Sep 18 13:42:02 SC-2 osafimmpbed: WA Failed to find CCB object for 141/321


Due to continoues CCB operations (even though disk is full) the 1PBE is seeing 
the following mesages for more than 3 hours:

messages:Sep 18 17:58:46 SC-2 osafimmpbed: WA Sqlite db locked by other thread.
messages:Sep 18 17:58:46 SC-2 osafimmpbed: WA Sqlite db locked by other thread.
messages:Sep 18 17:58:47 SC-2 osafimmpbed: WA Sqlite db locked by other thread.
messages:Sep 18 17:58:47 SC-2 osafimmpbed: WA Sqlite db locked by other thread.



messages.7:Sep 18 14:22:22 SC-2 osafimmpbed: WA Sqlite db locked by other 
thread.
messages.7:Sep 18 14:22:23 SC-2 osafimmpbed: WA Sqlite db locked by other 
thread.
messages.7:Sep 18 14:22:23 SC-2 osafimmpbed: WA Sqlite db locked by other 
thread.
messages.7:Sep 18 14:22:24 SC-2 osafimmpbed: WA Sqlite db locked by other thread

After freeing the space still the PBE is got struck in Sqlite db locked by 
other thread.
This is preventing any further operations. 
once the PBE is killed, the imm.db re-generated and the CCB operations are 
applied.

Solution(1PBE):

For the 1PBE case, which is not multi threaded, if the sqlite db locked case is 
reached abort the PBE and let the PBE be re-generated(instead of blocking the 
PBE process).

 



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1526 imm: 1PBE can see db as locked

2015-10-08 Thread Anders Bjornerstedt

- **summary**: imm: exit the 1PBE  when pbeBeginTrans sees db as locked --> 
imm: 1PBE can see db as locked
- **Comment**:

Changed ticket slogan to describe the problem.



---

** [tickets:#1526] imm: 1PBE can see db as locked**

**Status:** review
**Milestone:** 4.5.2
**Created:** Wed Oct 07, 2015 09:43 AM UTC by Neelakanta Reddy
**Last Updated:** Thu Oct 08, 2015 07:55 AM UTC
**Owner:** Neelakanta Reddy


when the disk is full the sqlite will return error.

Sep 18 13:42:02 SC-2 osafimmpbed: ER SQL statement ('COMMIT TRANSACTION') 
failed because:  disk I/O error
Sep 18 13:42:02 SC-2 osafimmnd[13067]: NO Invalid error reported implementer 
'OpenSafImmPBE', Ccb 321 will be aborted
Sep 18 13:42:02 SC-2 osafimmnd[13067]: NO Ccb 321 ABORTED (TraceC)
Sep 18 13:42:02 SC-2 osafimmpbed: WA Failed to find CCB object for 141/321


Due to continoues CCB operations (even though disk is full) the 1PBE is seeing 
the following mesages for more than 3 hours:

messages:Sep 18 17:58:46 SC-2 osafimmpbed: WA Sqlite db locked by other thread.
messages:Sep 18 17:58:46 SC-2 osafimmpbed: WA Sqlite db locked by other thread.
messages:Sep 18 17:58:47 SC-2 osafimmpbed: WA Sqlite db locked by other thread.
messages:Sep 18 17:58:47 SC-2 osafimmpbed: WA Sqlite db locked by other thread.



messages.7:Sep 18 14:22:22 SC-2 osafimmpbed: WA Sqlite db locked by other 
thread.
messages.7:Sep 18 14:22:23 SC-2 osafimmpbed: WA Sqlite db locked by other 
thread.
messages.7:Sep 18 14:22:23 SC-2 osafimmpbed: WA Sqlite db locked by other 
thread.
messages.7:Sep 18 14:22:24 SC-2 osafimmpbed: WA Sqlite db locked by other thread

After freeing the space still the PBE is got struck in Sqlite db locked by 
other thread.
This is preventing any further operations. 
once the PBE is killed, the imm.db re-generated and the CCB operations are 
applied.

Solution(1PBE):

For the 1PBE case, which is not multi threaded, if the sqlite db locked case is 
reached abort the PBE and let the PBE be re-generated(instead of blocking the 
PBE process).

 



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1499 IMM: Update immsv/README describing imm enhancements in 4.7

2015-10-07 Thread Anders Bjornerstedt

- **status**: review --> fixed
- **Comment**:

changeset:   6979:2a5befe801cf
tag: tip
parent:  6977:93c7269c4797
user:    Anders Bjornerstedt <anders.bjornerst...@ericsson.com>
date:Wed Oct 07 12:42:52 2015 +0200
summary: IMM: Update immsv/README describing imm enhancements in 4.7 [#1499]

changeset:   6978:46eae48ebfba
branch:  opensaf-4.7.x
parent:  6976:1736dee70266
user:    Anders Bjornerstedt <anders.bjornerst...@ericsson.com>
date:Wed Oct 07 12:42:52 2015 +0200
summary: IMM: Update immsv/README describing imm enhancements in 4.7 [#1499]



---

** [tickets:#1499] IMM: Update immsv/README describing imm enhancements in 4.7**

**Status:** fixed
**Milestone:** 4.7.RC1
**Created:** Thu Sep 24, 2015 10:55 AM UTC by Anders Bjornerstedt
**Last Updated:** Mon Oct 05, 2015 08:27 AM UTC
**Owner:** Anders Bjornerstedt





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Full-scale, agent-less Infrastructure Monitoring from a single dashboard
Integrate with 40+ ManageEngine ITSM Solutions for complete visibility
Physical-Virtual-Cloud Infrastructure monitoring from one console
Real user monitoring with APM Insights and performance trend reports 
Learn More http://pubads.g.doubleclick.net/gampad/clk?id=247754911=/4140___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1499 IMM: Update immsv/README describing imm enhancements in 4.7

2015-10-05 Thread Anders Bjornerstedt

- **status**: review --> accepted



---

** [tickets:#1499] IMM: Update immsv/README describing imm enhancements in 4.7**

**Status:** accepted
**Milestone:** 4.7.RC1
**Created:** Thu Sep 24, 2015 10:55 AM UTC by Anders Bjornerstedt
**Last Updated:** Fri Oct 02, 2015 11:04 AM UTC
**Owner:** Anders Bjornerstedt





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1499 IMM: Update immsv/README describing imm enhancements in 4.7

2015-10-05 Thread Anders Bjornerstedt

- **status**: accepted --> review



---

** [tickets:#1499] IMM: Update immsv/README describing imm enhancements in 4.7**

**Status:** review
**Milestone:** 4.7.RC1
**Created:** Thu Sep 24, 2015 10:55 AM UTC by Anders Bjornerstedt
**Last Updated:** Mon Oct 05, 2015 08:27 AM UTC
**Owner:** Anders Bjornerstedt





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1503 IMM: Augumented CCb client went down the OM client should get err

2015-10-05 Thread Anders Bjornerstedt

The is also a timeout on the OI callback (the create callback harboring the 
Augmentation).
Normally the OI timeout is shorter than the IMMA_SYNCR_TIMEOUT and so normally 
the
OM client should get an error on the ccb-create downcall before timeout.

But if the OM lcient has erduced the syncr timeout or the OI has increased its 
OI timeout then you could end up getting ERR_TIMEOUT on the OI side. This 
without anything being
wrong anywhere.



---

** [tickets:#1503] IMM: Augumented CCb client went down the OM client should 
get err**

**Status:** assigned
**Milestone:** 4.5.2
**Created:** Fri Sep 25, 2015 09:18 AM UTC by Neelakanta Reddy
**Last Updated:** Fri Sep 25, 2015 09:18 AM UTC
**Owner:** Neelakanta Reddy


OM on node1 and OI on node2.
OM creates an object. 
In OI augument by creating an object and the OI client goes down.
The CCb get aborted in IMM database.But the OM create API will not get return 
value and after SYNC_TIMEOUT OM API receives TIME_OUT.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1503 IMM: Augumented CCb client went down the OM client should get err

2015-10-05 Thread Anders Bjornerstedt

Yes I forgot that the OI callback timeout gets disabled by a ccb augmentation 
inside
the callback.


---

** [tickets:#1503] IMM: Augumented CCb client went down the OM client should 
get err**

**Status:** assigned
**Milestone:** 4.5.2
**Created:** Fri Sep 25, 2015 09:18 AM UTC by Neelakanta Reddy
**Last Updated:** Mon Oct 05, 2015 10:09 AM UTC
**Owner:** Neelakanta Reddy


OM on node1 and OI on node2.
OM creates an object. 
In OI augument by creating an object and the OI client goes down.
The CCb get aborted in IMM database.But the OM create API will not get return 
value and after SYNC_TIMEOUT OM API receives TIME_OUT.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1499 IMM: Update immsv/README describing imm enhancements in 4.7

2015-10-02 Thread Anders Bjornerstedt

- **status**: accepted --> review



---

** [tickets:#1499] IMM: Update immsv/README describing imm enhancements in 4.7**

**Status:** review
**Milestone:** 4.7.RC1
**Created:** Thu Sep 24, 2015 10:55 AM UTC by Anders Bjornerstedt
**Last Updated:** Thu Oct 01, 2015 02:37 PM UTC
**Owner:** Anders Bjornerstedt





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1425 IMM: Add attribute def flag SA_IMM_ATTR_STRONG_DEFAULT

2015-10-01 Thread Anders Bjornerstedt

- **status**: assigned --> unassigned
- **Milestone**: future --> 5.0



---

** [tickets:#1425] IMM: Add attribute def flag SA_IMM_ATTR_STRONG_DEFAULT**

**Status:** unassigned
**Milestone:** 5.0
**Created:** Fri Jul 24, 2015 12:49 PM UTC by Anders Bjornerstedt
**Last Updated:** Wed Sep 23, 2015 04:23 PM UTC
**Owner:** Hung Nguyen


The saImmOmClassCreate_2() API allows the user to provide a list of attribute 
definitions. An attribute definition may include a default value.

The default value will be assigned to this attribute in an instance being 
created
by the saImmOmCcbObjectCreate_2() or the saImmOiRtObjectCreate_2() APIs, if the
user does not provide a value for that attribute.

But a user/OI may later update such an object/attribute assigning the empty 
value
to the attribute. So the default value mechanism is only effective for object
creation and not later in the life cycle of the object. This makes the default
attribute value mechanism weaker than some users would like. 

This enhancement proposes a new attribute flag SA_IMM_ATTR_STRONG_DEFAULT.
This flag will only be allowed to be set on an attribute definition that 
includes
a default value.

The meaning of the flag is that if a user attempts an update of an 
object/attribute
that assigns the empty value to such an attribute, then the IMM will replace, 
i.e.
override, that value with the default value defined in the class.





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1425 IMM: Add attribute def flag SA_IMM_ATTR_STRONG_DEFAULT

2015-10-01 Thread Anders Bjornerstedt

- **status**: unassigned --> accepted



---

** [tickets:#1425] IMM: Add attribute def flag SA_IMM_ATTR_STRONG_DEFAULT**

**Status:** accepted
**Milestone:** 5.0
**Created:** Fri Jul 24, 2015 12:49 PM UTC by Anders Bjornerstedt
**Last Updated:** Thu Oct 01, 2015 10:38 AM UTC
**Owner:** Hung Nguyen


The saImmOmClassCreate_2() API allows the user to provide a list of attribute 
definitions. An attribute definition may include a default value.

The default value will be assigned to this attribute in an instance being 
created
by the saImmOmCcbObjectCreate_2() or the saImmOiRtObjectCreate_2() APIs, if the
user does not provide a value for that attribute.

But a user/OI may later update such an object/attribute assigning the empty 
value
to the attribute. So the default value mechanism is only effective for object
creation and not later in the life cycle of the object. This makes the default
attribute value mechanism weaker than some users would like. 

This enhancement proposes a new attribute flag SA_IMM_ATTR_STRONG_DEFAULT.
This flag will only be allowed to be set on an attribute definition that 
includes
a default value.

The meaning of the flag is that if a user attempts an update of an 
object/attribute
that assigns the empty value to such an attribute, then the IMM will replace, 
i.e.
override, that value with the default value defined in the class.





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1504 imm: Appliers for classes and objects are not synced to sync-client

2015-09-28 Thread Anders Bjornerstedt

- **Comment**:

I have a problem with this ticket.
Appliers are intentionally not synced.
They should not need to be synced.
The question here is how you manage to execute  a sync with a ccb being active.
Non empty Ccbs are terminated before the actual sync can start.

So there seems to have been introduced a bug somewhere.




---

** [tickets:#1504] imm: Appliers for classes and objects are not synced to 
sync-client**

**Status:** assigned
**Milestone:** 4.5.2
**Created:** Mon Sep 28, 2015 04:15 AM UTC by Hung Nguyen
**Last Updated:** Mon Sep 28, 2015 04:27 AM UTC
**Owner:** Hung Nguyen


Set an applier to a class. Then exit immapplier to detach the applier.


root@SC1:~# immapplier -a @whatever Test


 
 
Let another node join the cluster.
Create a CCB which is active on an object of 'Test' class. Don't commit the CCB.


root@SC1:~# immcfg
> immcfg -c Test test=1
>



Try to set applier again.


root@SC1:/srv/shared# immapplier -a @whatever Test
Implementer: @whatever
ImmVersion: A 2 16
error - saImmOiImplementerSet FAILED: SA_AIS_ERR_TRY_AGAIN (6)


SC-1


osafimmnd [419:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) 
from 0
osafimmnd [419:immnd_evt.c:9527] T2 originated here?:1 nodeId:2010f conn: 225
osafimmnd [419:ImmModel.cc:12967] >> implementerSet
osafimmnd [419:ImmModel.cc:13008] T7 Re-using implementer for @whatever
osafimmnd [419:ImmModel.cc:13040] TR TRY_AGAIN: ccb 2 is active on object 
'test=1' bound to class applier '@whatever'. Can not re-attach applier
osafimmnd [419:ImmModel.cc:13156] << implementerSet


PL-3


osafimmnd [392:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) 
from 0
osafimmnd [392:immnd_evt.c:9527] T2 originated here?:0 nodeId:2010f conn: 225
osafimmnd [392:ImmModel.cc:12967] >> implementerSet
osafimmnd [392:ImmModel.cc:13008] T7 Re-using implementer for @whatever
osafimmnd [392:ImmModel.cc:13087] NO Implementer (applier) connected: 5 
(@whatever) <0, 2010f>
osafimmnd [392:ImmModel.cc:13156] << implementerSet


IMMND on SC-1 rejected the implSet request but IMMND on PL-3 accepted it.
The applier was not synced to PL-3 (mAppliers.empty() returned true) so the 
implSet request passed the ccb check.

if( ! obj->mClassInfo->mAppliers.empty()) {
ImplementerSet::iterator ii = 
obj->mClassInfo->mAppliers.begin();
for(; ii != obj->mClassInfo->mAppliers.end(); ++ii) {
if((*ii) == info) {
TRACE("TRY_AGAIN: ccb %u is active on object '%s' "
   "bound to class applier '%s'. Can not re-attach 
applier",
   ccb->mId, omit->first.c_str(), implName.c_str());
err = SA_AIS_ERR_TRY_AGAIN;
goto done;
}
}
}

Now commit the CCB and try to set the applier again.

SC-1


osafimmnd [419:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) 
from 0
osafimmnd [419:immnd_evt.c:9527] T2 originated here?:1 nodeId:2010f conn: 226
osafimmnd [419:ImmModel.cc:12967] >> implementerSet
osafimmnd [419:ImmModel.cc:13008] T7 Re-using implementer for @whatever
osafimmnd [419:ImmModel.cc:13087] NO Implementer (applier) connected: 6 
(@whatever) <226, 2010f>
osafimmnd [419:ImmModel.cc:13156] << implementerSet


PL-3


osafimmnd [392:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) 
from 0
osafimmnd [392:immnd_evt.c:9527] T2 originated here?:0 nodeId:2010f conn: 226
osafimmnd [392:ImmModel.cc:12967] >> implementerSet
osafimmnd [392:ImmModel.cc:13003] T7 ERR_EXIST: Registered implementer already 
exists: @whatever
osafimmnd [392:ImmModel.cc:13005] << implementerSet


The applier had diferent ids on SC-1 and PL-3. When a new node joins the 
cluster, IMMND on PL-3 will crash when verifying the implementers.


PL3 osafimmnd[392]: ER Sync-verify: Established node has different 
Implementer-id: 5 for name: @whatever, sync says 6.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1504 imm: Appliers for classes and objects are not synced to sync-client

2015-09-28 Thread Anders Bjornerstedt

With the above solution there is the issue that the check is then not done in 
fevs order.
By the  time the implementer-set arrives over fevs at all nodes, there may have 
been creaed a
ccb-operation that interferes, resulting in the implementer-set having to be 
aborted anyway. 

The local immnd thus has to run the applier checks again in the receiving fevs 
for implementer-set.
If that check fails, it rejects the operation, replies with error to the client 
and broadcast an implementer_clear over fevs.


---

** [tickets:#1504] imm: Appliers for classes and objects are not synced to 
sync-client**

**Status:** assigned
**Milestone:** 4.5.2
**Created:** Mon Sep 28, 2015 04:15 AM UTC by Hung Nguyen
**Last Updated:** Mon Sep 28, 2015 09:26 AM UTC
**Owner:** Hung Nguyen


Set an applier to a class. Then exit immapplier to detach the applier.


root@SC1:~# immapplier -a @whatever Test


 
 
Let another node join the cluster.
Create a CCB which is active on an object of 'Test' class. Don't commit the CCB.


root@SC1:~# immcfg
> immcfg -c Test test=1
>



Try to set applier again.


root@SC1:/srv/shared# immapplier -a @whatever Test
Implementer: @whatever
ImmVersion: A 2 16
error - saImmOiImplementerSet FAILED: SA_AIS_ERR_TRY_AGAIN (6)


SC-1


osafimmnd [419:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) 
from 0
osafimmnd [419:immnd_evt.c:9527] T2 originated here?:1 nodeId:2010f conn: 225
osafimmnd [419:ImmModel.cc:12967] >> implementerSet
osafimmnd [419:ImmModel.cc:13008] T7 Re-using implementer for @whatever
osafimmnd [419:ImmModel.cc:13040] TR TRY_AGAIN: ccb 2 is active on object 
'test=1' bound to class applier '@whatever'. Can not re-attach applier
osafimmnd [419:ImmModel.cc:13156] << implementerSet


PL-3


osafimmnd [392:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) 
from 0
osafimmnd [392:immnd_evt.c:9527] T2 originated here?:0 nodeId:2010f conn: 225
osafimmnd [392:ImmModel.cc:12967] >> implementerSet
osafimmnd [392:ImmModel.cc:13008] T7 Re-using implementer for @whatever
osafimmnd [392:ImmModel.cc:13087] NO Implementer (applier) connected: 5 
(@whatever) <0, 2010f>
osafimmnd [392:ImmModel.cc:13156] << implementerSet


IMMND on SC-1 rejected the implSet request but IMMND on PL-3 accepted it.
The applier was not synced to PL-3 (mAppliers.empty() returned true) so the 
implSet request passed the ccb check.

if( ! obj->mClassInfo->mAppliers.empty()) {
ImplementerSet::iterator ii = 
obj->mClassInfo->mAppliers.begin();
for(; ii != obj->mClassInfo->mAppliers.end(); ++ii) {
if((*ii) == info) {
TRACE("TRY_AGAIN: ccb %u is active on object '%s' "
   "bound to class applier '%s'. Can not re-attach 
applier",
   ccb->mId, omit->first.c_str(), implName.c_str());
err = SA_AIS_ERR_TRY_AGAIN;
goto done;
}
}
}

Now commit the CCB and try to set the applier again.

SC-1


osafimmnd [419:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) 
from 0
osafimmnd [419:immnd_evt.c:9527] T2 originated here?:1 nodeId:2010f conn: 226
osafimmnd [419:ImmModel.cc:12967] >> implementerSet
osafimmnd [419:ImmModel.cc:13008] T7 Re-using implementer for @whatever
osafimmnd [419:ImmModel.cc:13087] NO Implementer (applier) connected: 6 
(@whatever) <226, 2010f>
osafimmnd [419:ImmModel.cc:13156] << implementerSet


PL-3


osafimmnd [392:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) 
from 0
osafimmnd [392:immnd_evt.c:9527] T2 originated here?:0 nodeId:2010f conn: 226
osafimmnd [392:ImmModel.cc:12967] >> implementerSet
osafimmnd [392:ImmModel.cc:13003] T7 ERR_EXIST: Registered implementer already 
exists: @whatever
osafimmnd [392:ImmModel.cc:13005] << implementerSet


The applier had diferent ids on SC-1 and PL-3. When a new node joins the 
cluster, IMMND on PL-3 will crash when verifying the implementers.


PL3 osafimmnd[392]: ER Sync-verify: Established node has different 
Implementer-id: 5 for name: @whatever, sync says 6.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1504 imm: Implicit class/object-applier checked by OiImplementeSet is incorrect

2015-09-28 Thread Anders Bjornerstedt

- **summary**: imm: Appliers for classes and objects are not synced to 
sync-client --> imm: Implicit class/object-applier checked by OiImplementeSet 
is incorrect



---

** [tickets:#1504] imm: Implicit class/object-applier checked by 
OiImplementeSet is incorrect**

**Status:** assigned
**Milestone:** 4.5.2
**Created:** Mon Sep 28, 2015 04:15 AM UTC by Hung Nguyen
**Last Updated:** Mon Sep 28, 2015 09:38 AM UTC
**Owner:** Hung Nguyen


Set an applier to a class. Then exit immapplier to detach the applier.


root@SC1:~# immapplier -a @whatever Test


 
 
Let another node join the cluster.
Create a CCB which is active on an object of 'Test' class. Don't commit the CCB.


root@SC1:~# immcfg
> immcfg -c Test test=1
>



Try to set applier again.


root@SC1:/srv/shared# immapplier -a @whatever Test
Implementer: @whatever
ImmVersion: A 2 16
error - saImmOiImplementerSet FAILED: SA_AIS_ERR_TRY_AGAIN (6)


SC-1


osafimmnd [419:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) 
from 0
osafimmnd [419:immnd_evt.c:9527] T2 originated here?:1 nodeId:2010f conn: 225
osafimmnd [419:ImmModel.cc:12967] >> implementerSet
osafimmnd [419:ImmModel.cc:13008] T7 Re-using implementer for @whatever
osafimmnd [419:ImmModel.cc:13040] TR TRY_AGAIN: ccb 2 is active on object 
'test=1' bound to class applier '@whatever'. Can not re-attach applier
osafimmnd [419:ImmModel.cc:13156] << implementerSet


PL-3


osafimmnd [392:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) 
from 0
osafimmnd [392:immnd_evt.c:9527] T2 originated here?:0 nodeId:2010f conn: 225
osafimmnd [392:ImmModel.cc:12967] >> implementerSet
osafimmnd [392:ImmModel.cc:13008] T7 Re-using implementer for @whatever
osafimmnd [392:ImmModel.cc:13087] NO Implementer (applier) connected: 5 
(@whatever) <0, 2010f>
osafimmnd [392:ImmModel.cc:13156] << implementerSet


IMMND on SC-1 rejected the implSet request but IMMND on PL-3 accepted it.
The applier was not synced to PL-3 (mAppliers.empty() returned true) so the 
implSet request passed the ccb check.

if( ! obj->mClassInfo->mAppliers.empty()) {
ImplementerSet::iterator ii = 
obj->mClassInfo->mAppliers.begin();
for(; ii != obj->mClassInfo->mAppliers.end(); ++ii) {
if((*ii) == info) {
TRACE("TRY_AGAIN: ccb %u is active on object '%s' "
   "bound to class applier '%s'. Can not re-attach 
applier",
   ccb->mId, omit->first.c_str(), implName.c_str());
err = SA_AIS_ERR_TRY_AGAIN;
goto done;
}
}
}

Now commit the CCB and try to set the applier again.

SC-1


osafimmnd [419:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) 
from 0
osafimmnd [419:immnd_evt.c:9527] T2 originated here?:1 nodeId:2010f conn: 226
osafimmnd [419:ImmModel.cc:12967] >> implementerSet
osafimmnd [419:ImmModel.cc:13008] T7 Re-using implementer for @whatever
osafimmnd [419:ImmModel.cc:13087] NO Implementer (applier) connected: 6 
(@whatever) <226, 2010f>
osafimmnd [419:ImmModel.cc:13156] << implementerSet


PL-3


osafimmnd [392:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) 
from 0
osafimmnd [392:immnd_evt.c:9527] T2 originated here?:0 nodeId:2010f conn: 226
osafimmnd [392:ImmModel.cc:12967] >> implementerSet
osafimmnd [392:ImmModel.cc:13003] T7 ERR_EXIST: Registered implementer already 
exists: @whatever
osafimmnd [392:ImmModel.cc:13005] << implementerSet


The applier had diferent ids on SC-1 and PL-3. When a new node joins the 
cluster, IMMND on PL-3 will crash when verifying the implementers.


PL3 osafimmnd[392]: ER Sync-verify: Established node has different 
Implementer-id: 5 for name: @whatever, sync says 6.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1504 imm: Appliers for classes and objects are not synced to sync-client

2015-09-28 Thread Anders Bjornerstedt

The applier-names are synced but the class/object-applier data is not sync-ed.
That is intentional and I dont want a solution that tries to sync all applier 
information to all nodes.

The class-applier and object-applier mechanism is inherrently local, i.e. only 
used at the node where
the applier exists. Remeber that an applier is a listener and not a true 
particiapnt in CCbs, so
its existence should only matter locally. The only thing thatg is global is the 
existence of an
applier with a certain name and the current location if any for that exact 
applier with that name.

Having said that, it is still important that the local class/object applier is 
not allowed to attach in such
a way that it can see an incomplete ccb.

Iam thinking about what the best approach foir a fix would be.
Dont start doing some complex implementation of this yet.



---

** [tickets:#1504] imm: Appliers for classes and objects are not synced to 
sync-client**

**Status:** assigned
**Milestone:** 4.5.2
**Created:** Mon Sep 28, 2015 04:15 AM UTC by Hung Nguyen
**Last Updated:** Mon Sep 28, 2015 07:19 AM UTC
**Owner:** Hung Nguyen


Set an applier to a class. Then exit immapplier to detach the applier.


root@SC1:~# immapplier -a @whatever Test


 
 
Let another node join the cluster.
Create a CCB which is active on an object of 'Test' class. Don't commit the CCB.


root@SC1:~# immcfg
> immcfg -c Test test=1
>



Try to set applier again.


root@SC1:/srv/shared# immapplier -a @whatever Test
Implementer: @whatever
ImmVersion: A 2 16
error - saImmOiImplementerSet FAILED: SA_AIS_ERR_TRY_AGAIN (6)


SC-1


osafimmnd [419:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) 
from 0
osafimmnd [419:immnd_evt.c:9527] T2 originated here?:1 nodeId:2010f conn: 225
osafimmnd [419:ImmModel.cc:12967] >> implementerSet
osafimmnd [419:ImmModel.cc:13008] T7 Re-using implementer for @whatever
osafimmnd [419:ImmModel.cc:13040] TR TRY_AGAIN: ccb 2 is active on object 
'test=1' bound to class applier '@whatever'. Can not re-attach applier
osafimmnd [419:ImmModel.cc:13156] << implementerSet


PL-3


osafimmnd [392:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) 
from 0
osafimmnd [392:immnd_evt.c:9527] T2 originated here?:0 nodeId:2010f conn: 225
osafimmnd [392:ImmModel.cc:12967] >> implementerSet
osafimmnd [392:ImmModel.cc:13008] T7 Re-using implementer for @whatever
osafimmnd [392:ImmModel.cc:13087] NO Implementer (applier) connected: 5 
(@whatever) <0, 2010f>
osafimmnd [392:ImmModel.cc:13156] << implementerSet


IMMND on SC-1 rejected the implSet request but IMMND on PL-3 accepted it.
The applier was not synced to PL-3 (mAppliers.empty() returned true) so the 
implSet request passed the ccb check.

if( ! obj->mClassInfo->mAppliers.empty()) {
ImplementerSet::iterator ii = 
obj->mClassInfo->mAppliers.begin();
for(; ii != obj->mClassInfo->mAppliers.end(); ++ii) {
if((*ii) == info) {
TRACE("TRY_AGAIN: ccb %u is active on object '%s' "
   "bound to class applier '%s'. Can not re-attach 
applier",
   ccb->mId, omit->first.c_str(), implName.c_str());
err = SA_AIS_ERR_TRY_AGAIN;
goto done;
}
}
}

Now commit the CCB and try to set the applier again.

SC-1


osafimmnd [419:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) 
from 0
osafimmnd [419:immnd_evt.c:9527] T2 originated here?:1 nodeId:2010f conn: 226
osafimmnd [419:ImmModel.cc:12967] >> implementerSet
osafimmnd [419:ImmModel.cc:13008] T7 Re-using implementer for @whatever
osafimmnd [419:ImmModel.cc:13087] NO Implementer (applier) connected: 6 
(@whatever) <226, 2010f>
osafimmnd [419:ImmModel.cc:13156] << implementerSet


PL-3


osafimmnd [392:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) 
from 0
osafimmnd [392:immnd_evt.c:9527] T2 originated here?:0 nodeId:2010f conn: 226
osafimmnd [392:ImmModel.cc:12967] >> implementerSet
osafimmnd [392:ImmModel.cc:13003] T7 ERR_EXIST: Registered implementer already 
exists: @whatever
osafimmnd [392:ImmModel.cc:13005] << implementerSet


The applier had diferent ids on SC-1 and PL-3. When a new node joins the 
cluster, IMMND on PL-3 will crash when verifying the implementers.


PL3 osafimmnd[392]: ER Sync-verify: Established node has different 
Implementer-id: 5 for name: @whatever, sync says 6.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing

[tickets] [opensaf:tickets] #1504 imm: Appliers for classes and objects are not synced to sync-client

2015-09-28 Thread Anders Bjornerstedt

The problem is the feature of *implicit* class-implementer-set and *implicit* 
object-implementer-set.
Ironically this feature is parctically useless for appliers.

One possible (and relatively simple) solution would be to only do the ccb 
interference checks for
*appliers* at the node where the applier is actually attaching. That would 
almost be in
fevs_local_checks, except that implementer-set is not a regular fevs message at 
the sending side.
So instad it would be in immnd_evt_proc_impl_set in immnd_evt.c.

If the check fails then the local IMMND simply rejects the request with 
TRY_AGAIN (or ERR_BUSY 
would in reality be better here since the immsv has no control over how long 
the wait will be).

The current applier check at the fevs receiving side for implementer-set is 
simply removed.




---

** [tickets:#1504] imm: Appliers for classes and objects are not synced to 
sync-client**

**Status:** assigned
**Milestone:** 4.5.2
**Created:** Mon Sep 28, 2015 04:15 AM UTC by Hung Nguyen
**Last Updated:** Mon Sep 28, 2015 08:44 AM UTC
**Owner:** Hung Nguyen


Set an applier to a class. Then exit immapplier to detach the applier.


root@SC1:~# immapplier -a @whatever Test


 
 
Let another node join the cluster.
Create a CCB which is active on an object of 'Test' class. Don't commit the CCB.


root@SC1:~# immcfg
> immcfg -c Test test=1
>



Try to set applier again.


root@SC1:/srv/shared# immapplier -a @whatever Test
Implementer: @whatever
ImmVersion: A 2 16
error - saImmOiImplementerSet FAILED: SA_AIS_ERR_TRY_AGAIN (6)


SC-1


osafimmnd [419:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) 
from 0
osafimmnd [419:immnd_evt.c:9527] T2 originated here?:1 nodeId:2010f conn: 225
osafimmnd [419:ImmModel.cc:12967] >> implementerSet
osafimmnd [419:ImmModel.cc:13008] T7 Re-using implementer for @whatever
osafimmnd [419:ImmModel.cc:13040] TR TRY_AGAIN: ccb 2 is active on object 
'test=1' bound to class applier '@whatever'. Can not re-attach applier
osafimmnd [419:ImmModel.cc:13156] << implementerSet


PL-3


osafimmnd [392:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) 
from 0
osafimmnd [392:immnd_evt.c:9527] T2 originated here?:0 nodeId:2010f conn: 225
osafimmnd [392:ImmModel.cc:12967] >> implementerSet
osafimmnd [392:ImmModel.cc:13008] T7 Re-using implementer for @whatever
osafimmnd [392:ImmModel.cc:13087] NO Implementer (applier) connected: 5 
(@whatever) <0, 2010f>
osafimmnd [392:ImmModel.cc:13156] << implementerSet


IMMND on SC-1 rejected the implSet request but IMMND on PL-3 accepted it.
The applier was not synced to PL-3 (mAppliers.empty() returned true) so the 
implSet request passed the ccb check.

if( ! obj->mClassInfo->mAppliers.empty()) {
ImplementerSet::iterator ii = 
obj->mClassInfo->mAppliers.begin();
for(; ii != obj->mClassInfo->mAppliers.end(); ++ii) {
if((*ii) == info) {
TRACE("TRY_AGAIN: ccb %u is active on object '%s' "
   "bound to class applier '%s'. Can not re-attach 
applier",
   ccb->mId, omit->first.c_str(), implName.c_str());
err = SA_AIS_ERR_TRY_AGAIN;
goto done;
}
}
}

Now commit the CCB and try to set the applier again.

SC-1


osafimmnd [419:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) 
from 0
osafimmnd [419:immnd_evt.c:9527] T2 originated here?:1 nodeId:2010f conn: 226
osafimmnd [419:ImmModel.cc:12967] >> implementerSet
osafimmnd [419:ImmModel.cc:13008] T7 Re-using implementer for @whatever
osafimmnd [419:ImmModel.cc:13087] NO Implementer (applier) connected: 6 
(@whatever) <226, 2010f>
osafimmnd [419:ImmModel.cc:13156] << implementerSet


PL-3


osafimmnd [392:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) 
from 0
osafimmnd [392:immnd_evt.c:9527] T2 originated here?:0 nodeId:2010f conn: 226
osafimmnd [392:ImmModel.cc:12967] >> implementerSet
osafimmnd [392:ImmModel.cc:13003] T7 ERR_EXIST: Registered implementer already 
exists: @whatever
osafimmnd [392:ImmModel.cc:13005] << implementerSet


The applier had diferent ids on SC-1 and PL-3. When a new node joins the 
cluster, IMMND on PL-3 will crash when verifying the implementers.


PL3 osafimmnd[392]: ER Sync-verify: Established node has different 
Implementer-id: 5 for name: @whatever, sync says 6.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--

[tickets] [opensaf:tickets] #1313 osaf: opensaf does not start when long dn object is present in imm.db and cluster is reset

2015-09-24 Thread Anders Bjornerstedt

- **status**: unassigned --> duplicate
- **Component**: osaf --> log
- **Version**: 4.6 FC --> 4.6
- **Milestone**: 4.5.1 --> never
- **Comment**:

Duplicate of #1452 which is fixed.

https://sourceforge.net/p/opensaf/tickets/1452/



---

** [tickets:#1313] osaf: opensaf does not start when long dn object is present 
in imm.db and cluster is reset**

**Status:** duplicate
**Milestone:** never
**Created:** Mon Apr 13, 2015 08:57 AM UTC by Sirisha Alla
**Last Updated:** Fri Aug 14, 2015 12:39 PM UTC
**Owner:** Mathi Naickan
**Attachments:**

- 
[slot1.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1313/attachment/slot1.tar.bz2)
 (269.6 kB; application/x-bzip)


This is observed on changeset 6377 (46FC Tag). The system is up with single pbe 
and 50k objects. Long dns was enabled. There is one long dn object in the 
cluster.

Syslog on SC-1:

Apr  9 15:49:14 SLES-64BIT-SLOT1 osafimmnd[10731]: WA Setting attr 
longDnsAllowed to 0 in opensafImm=opensafImm,safApp=safImmService not allowed 
when long RDN exists inside object: 
xattrName_testAdminOwnerClear_SubLevelScope_1011

Now the cluster is reset. Nodes in the cluster fail to come up with the 
following reason:

Apr 13 13:04:55 SLES-64BIT-SLOT1 osafimmnd[3439]: NO Persistent Back End OI 
attached, pid: 3465
Apr 13 13:04:55 SLES-64BIT-SLOT1 osafimmnd[3439]: NO Implementer connected: 1 
(OpenSafImmPBE) <10, 2010f>
Apr 13 13:04:55 SLES-64BIT-SLOT1 osafimmnd[3439]: NO implementer for class 
'OpensafImm' is OpenSafImmPBE => class extent is safe.
Apr 13 13:04:55 SLES-64BIT-SLOT1 osafimmpbed: NO Update epoch 20 committing 
with ccbId:10003/4294967299
Apr 13 13:04:56 SLES-64BIT-SLOT1 osafimmnd[3439]: NO PBE-OI established on this 
SC. Dumping incrementally to file imm.db
Apr 13 13:05:34 SLES-64BIT-SLOT1 opensafd[3378]: ER Timed-out for response from 
LOGD
Apr 13 13:05:34 SLES-64BIT-SLOT1 opensafd[3378]: ER
Apr 13 13:05:34 SLES-64BIT-SLOT1 opensafd[3378]: ER Going for recovery
Apr 13 13:05:34 SLES-64BIT-SLOT1 opensafd[3378]: ER Trying To RESPAWN 
/usr/lib64/opensaf/clc-cli/osaf-logd attempt #1
Apr 13 13:05:34 SLES-64BIT-SLOT1 opensafd[3378]: ER Sending SIGKILL to LOGD, 
pid=3452
Apr 13 13:05:49 SLES-64BIT-SLOT1 osaflogd[3500]: Started
Apr 13 13:05:49 SLES-64BIT-SLOT1 osaflogd[3500]: WA read_logsv_configuration(). 
All attributes could not be read
Apr 13 13:05:49 SLES-64BIT-SLOT1 osaflogd[3500]: NO Log config system: high 0 
low 0, application: high 0 low 0
Apr 13 13:05:49 SLES-64BIT-SLOT1 osaflogd[3500]: NO log root directory is: 
/var/log/opensaf/saflog
Apr 13 13:05:49 SLES-64BIT-SLOT1 osaflogd[3500]: NO LOG data group is:
Apr 13 13:05:49 SLES-64BIT-SLOT1 osaflogd[3500]: NO LGS_MBCSV_VERSION = 4
Apr 13 13:05:49 SLES-64BIT-SLOT1 osaflogd[3500]: saImmOmSearchInitialize 
FAILED, rc = 13
Apr 13 13:06:29 SLES-64BIT-SLOT1 opensafd[3378]: ER Timed-out for response from 
LOGD
Apr 13 13:06:29 SLES-64BIT-SLOT1 opensafd[3378]: ER Could Not RESPAWN LOGD
Apr 13 13:06:29 SLES-64BIT-SLOT1 opensafd[3378]: ER
Apr 13 13:06:29 SLES-64BIT-SLOT1 opensafd[3378]: ER Trying To RESPAWN 
/usr/lib64/opensaf/clc-cli/osaf-logd attempt #2
Apr 13 13:06:29 SLES-64BIT-SLOT1 opensafd[3378]: ER Sending SIGKILL to LOGD, 
pid=3495
Apr 13 13:06:44 SLES-64BIT-SLOT1 osaflogd[3546]: Started
Apr 13 13:06:44 SLES-64BIT-SLOT1 osaflogd[3546]: WA read_logsv_configuration(). 
All attributes could not be read
Apr 13 13:06:44 SLES-64BIT-SLOT1 osaflogd[3546]: NO Log config system: high 0 
low 0, application: high 0 low 0
Apr 13 13:06:44 SLES-64BIT-SLOT1 osaflogd[3546]: NO log root directory is: 
/var/log/opensaf/saflog
Apr 13 13:06:44 SLES-64BIT-SLOT1 osaflogd[3546]: NO LOG data group is:
Apr 13 13:06:44 SLES-64BIT-SLOT1 osaflogd[3546]: NO LGS_MBCSV_VERSION = 4
Apr 13 13:06:44 SLES-64BIT-SLOT1 osaflogd[3546]: saImmOmSearchInitialize 
FAILED, rc = 13
Apr 13 13:07:24 SLES-64BIT-SLOT1 opensafd[3378]: ER Timed-out for response from 
LOGD
Apr 13 13:07:24 SLES-64BIT-SLOT1 opensafd[3378]: ER Could Not RESPAWN LOGD
Apr 13 13:07:24 SLES-64BIT-SLOT1 opensafd[3378]: ER
Apr 13 13:07:24 SLES-64BIT-SLOT1 opensafd[3378]: ER FAILED TO RESPAWN
Apr 13 13:07:24 SLES-64BIT-SLOT1 osaffmd[3419]: exiting for shutdown
Apr 13 13:07:24 SLES-64BIT-SLOT1 osafimmd[3429]: exiting for shutdown
Apr 13 13:07:24 SLES-64BIT-SLOT1 osafimmnd[3439]: NO No IMMD service => cluster 
restart, exiting
Apr 13 13:07:24 SLES-64BIT-SLOT1 osafimmpbed: WA PBE lost contact with parent 
IMMND - Exiting
Apr 13 13:07:24 SLES-64BIT-SLOT1 osafrded[3410]: exiting for shutdown
Apr 13 13:07:24 SLES-64BIT-SLOT1 kernel: [ 1630.782513] TIPC: Disabling bearer 

Apr 13 13:07:24 SLES-64BIT-SLOT1 kernel: [ 1630.782518] TIPC: Lost link 
<1.1.1:eth0-1.1.4:eth0> on network plane A
Apr 13 13:07:24 SLES-64BIT-SLOT1 kernel: [ 1630.782521] TIPC: Lost contact with 
<1.1.4>
Apr 13

[tickets] [opensaf:tickets] #1494 imm: missmatch in obj_create_rsp event type

2015-09-22 Thread Anders Bjornerstedt

- **Version**:  --> 4.7



---

** [tickets:#1494] imm: missmatch in obj_create_rsp event type**

**Status:** review
**Milestone:** 4.7.FC
**Created:** Tue Sep 22, 2015 05:52 AM UTC by Neelakanta Reddy
**Last Updated:** Tue Sep 22, 2015 06:06 AM UTC
**Owner:** Neelakanta Reddy


immnd_evt.c:3756: immnd_evt_proc_ccb_obj_create_rsp: Assertion 'evt->type == 
IMMND_EVT_A2ND_CCB_OBJ_MODIFY_RSP_2' failed.

in the create_rsp modify_rsp_2 is used and it should be 
IMMND_EVT_A2ND_CCB_OBJ_CREATE_RSP.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1291 IMM: IMMD healthcheck callback timeout when standby controller rebooted in middle of IMMND sync

2015-09-21 Thread Anders Bjornerstedt

- **status**: unassigned --> invalid
- **Comment**:

If the problem is to be declared as a configuration error, i.e. solvable by 
adjusting one or more
configuration values that are documented by OpenSAF, then the ticket should be 
closed as
invalid.



---

** [tickets:#1291] IMM: IMMD healthcheck callback timeout when standby 
controller rebooted in middle of IMMND sync**

**Status:** invalid
**Milestone:** 4.5.2
**Created:** Mon Mar 30, 2015 07:21 AM UTC by Sirisha Alla
**Last Updated:** Mon Sep 21, 2015 04:47 AM UTC
**Owner:** nobody
**Attachments:**

- 
[immlogs.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1291/attachment/immlogs.tar.bz2)
 (6.8 MB; application/x-bzip)


The issue is observed with 4.6 FC changeset 6377. The system is up and running 
with single pbe and 50k objects. This issue is seen after 
http://sourceforge.net/p/opensaf/tickets/1290 is observed. IMM application is 
running on standby controller and immcfg command is run from payload to set 
CompRestartMax value to 1000. IMMND is killed twice on standby controller 
leading to #1290.

As a result, standby controller left the cluster in middle of sync, IMMD 
reported healthcheck callback timeout and the active controller too went for 
reboot. Following is the syslog of SC-1:

Mar 26 14:58:17 SLES-64BIT-SLOT1 osafimmloadd: NO Sync starting
Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: NO Node Down event for node id 
2020f:
Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: NO Current role: ACTIVE
Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: Rebooting OpenSAF NodeId = 
131599 EE Name = , Reason: Received Node Down for peer controller, OwnNodeId = 
131343, SupervisionTime = 60
Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.412080] TIPC: Resetting link 
<1.1.1:eth0-1.1.2:eth0>, peer not responding
Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.412089] TIPC: Lost link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.413191] TIPC: Lost contact with 
<1.1.2>
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not 
sending track callback for agents on that node
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not 
sending track callback for agents on that node
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not 
sending track callback for agents on that node
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not 
sending track callback for agents on that node
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not 
sending track callback for agents on that node
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not 
sending track callback for agents on that node
Mar 26 14:58:30 SLES-64BIT-SLOT1 osafamfd[9628]: NO Node 'SC-2' left the cluster
Mar 26 14:58:30 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting remote node in the 
absence of PLM is outside the scope of OpenSAF
Mar 26 14:58:54 SLES-64BIT-SLOT1 kernel: [15226.674333] TIPC: Established link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Mar 26 15:00:02 SLES-64BIT-SLOT1 syslog-ng[3261]: Log statistics; 
dropped='pipe(/dev/xconsole)=0', dropped='pipe(/dev/tty10)=0', 
processed='center(queued)=2197', processed='center(received)=1172', 
processed='destination(messages)=1172', processed='destination(mailinfo)=0', 
processed='destination(mailwarn)=0', 
processed='destination(localmessages)=955', processed='destination(newserr)=0', 
processed='destination(mailerr)=0', processed='destination(netmgm)=0', 
processed='destination(warn)=44', processed='destination(console)=13', 
processed='destination(null)=0', processed='destination(mail)=0', 
processed='destination(xconsole)=13', processed='destination(firewall)=0', 
processed='destination(acpid)=0', processed='destination(newscrit)=0', 
processed='destination(newsnotice)=0', processed='source(src)=1172'
Mar 26 15:00:07 SLES-64BIT-SLOT1 osafimmloadd: ER Too many TRY_AGAIN on 
saImmOmSearchNext - aborting
Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: ER SYNC APPARENTLY FAILED 
status:1
Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO -SERVER STATE: 
IMM_SERVER_SYNC_SERVER --> IMM_SERVER_READY
Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO NODE STATE-> 
IMM_NODE_FULLY_AVAILABLE (2484)
Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO Epoch set to 12 in ImmModel
Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO Coord broadcasting 
ABORT_SYNC, epoch:12
Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmpbed: NO Update epoch 12 committing 
with ccbId:10054/4294967380
Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO SU failover probation 
timer started (timeout: 12000 ns)
Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO Performing failover of 
'safSu=SC-1,safSg=2N,safApp=OpenSAF' (SU failover count: 1)
Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO 
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' recovery action escalated

[tickets] [opensaf:tickets] #1269 IMM: Library side behavior at failure to allocatge memory needs to be consistent

2015-09-19 Thread Anders Bjornerstedt

- **Milestone**: 4.7.FC --> future



---

** [tickets:#1269] IMM: Library side behavior at failure to allocatge memory 
needs to be consistent**

**Status:** assigned
**Milestone:** future
**Created:** Tue Mar 17, 2015 10:24 AM UTC by Anders Bjornerstedt
**Last Updated:** Tue Aug 25, 2015 04:08 PM UTC
**Owner:** Hung Nguyen


The IMM library/agent side (IMMA) should behave consistently and follow
a consistent coding pattern for dealing with the case of failure to allocate
memory.

The IMMA library is linked with an application process that is using the IMM
service. Failure to allocate memory is rare and means that the the processor
where the application is executing is overloaded. Because the IMMA library is
hosted by a an application, there is some merrit in returning control to the
application letting it decide how to escalate. This is "nice" towards the 
application, making troubleshooting simpler for those responsible for the
application. 

In terms coding, the simplest solution possible should be used. The allowed
solutions in coding on the IMM library/agent side should be:

a) Return SA_AIS_ERR_NO_MEMORY
b) osafassert the pointer after malloc/calloc/strdup
c) Nothing, i.e. segv at the next dereference.

where (a) is recommended when the allocation error occurs close to the API;
(b) is recommended in deeper levels of function invocation; 
(c) is allowed in legacy library code, but should be avoided in new/updated 
code.

We need to allow (c) in the agent/library, otherwise this ticket would be a 
defect
ticket.

Writing explict if statements checking for null and writing explicit customized
syslog error messages, or trace messages is not allowed in the library for the
memory allocation failure case. Osafassert does write to the syslog but that
is allowed exception here. 



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1474 imm: Assigning default value to no-dangling attributes make cluster fail to start

2015-09-14 Thread Anders Bjornerstedt

- **status**: unassigned --> assigned
- **assigned_to**: Hung Nguyen



---

** [tickets:#1474] imm: Assigning default value to no-dangling attributes make 
cluster fail to start**

**Status:** assigned
**Milestone:** 4.5.2
**Created:** Thu Sep 10, 2015 02:27 PM UTC by Hung Nguyen
**Last Updated:** Mon Sep 14, 2015 09:39 AM UTC
**Owner:** Hung Nguyen


root@SC1:~# immlist -c Test
<< Test - CONFIG >>
test : SA_STRING_T [1] {RDN, CONFIG, INITIALIZED}
dep : SA_NAME_T [0] = test=1 (6)  {CONFIG, WRITEABLE, NO_DANGLING}

Create test=1 and test=2
root@SC1:~# immcfg -c Test test=1
root@SC1:~# immcfg -c Test test=2

Set the attribute with default value to empty.
root@SC1:~# immcfg -a dep= test=2
root@SC-1:~# immlist -a dep test=2
dep=

Now test=1 can be deleted
root@SC1:~# immcfg -d test=1

Reboot cluster and it will fail to start
Sep 10 21:03:36 SC1 osafimmloadd: NO * Loading from PBE file imm.db at 
/srv/shared/imm/ *
Sep 10 21:03:40 SC1 osafimmnd[421]: NO ERR_FAILED_OPERATION: NO_DANGLING 
reference (test=1) is dangling (Ccb 1)
Sep 10 21:03:40 SC1 osafimmnd[421]: NO Ccb 1 ABORTED (IMMLOADER)

[#1377]


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1474 imm: Assigning default value to no-dangling attributes make cluster fail to start

2015-09-14 Thread Anders Bjornerstedt

- **Priority**: major --> critical
- **Comment**:

Raising severity to critical since the symptom caused by this defect is severe.

I propose that the solution is to not allow default values to be defined for 
attributes flagged with NO_DANGLING in the class definition.



---

** [tickets:#1474] imm: Assigning default value to no-dangling attributes make 
cluster fail to start**

**Status:** unassigned
**Milestone:** 4.5.2
**Created:** Thu Sep 10, 2015 02:27 PM UTC by Hung Nguyen
**Last Updated:** Thu Sep 10, 2015 02:27 PM UTC
**Owner:** nobody


root@SC1:~# immlist -c Test
<< Test - CONFIG >>
test : SA_STRING_T [1] {RDN, CONFIG, INITIALIZED}
dep : SA_NAME_T [0] = test=1 (6)  {CONFIG, WRITEABLE, NO_DANGLING}

Create test=1 and test=2
root@SC1:~# immcfg -c Test test=1
root@SC1:~# immcfg -c Test test=2

Set the attribute with default value to empty.
root@SC1:~# immcfg -a dep= test=2
root@SC-1:~# immlist -a dep test=2
dep=

Now test=1 can be deleted
root@SC1:~# immcfg -d test=1

Reboot cluster and it will fail to start
Sep 10 21:03:36 SC1 osafimmloadd: NO * Loading from PBE file imm.db at 
/srv/shared/imm/ *
Sep 10 21:03:40 SC1 osafimmnd[421]: NO ERR_FAILED_OPERATION: NO_DANGLING 
reference (test=1) is dangling (Ccb 1)
Sep 10 21:03:40 SC1 osafimmnd[421]: NO Ccb 1 ABORTED (IMMLOADER)

[#1377]


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1472 imm: Default values are assigned to empty-valued attributes when sync

2015-09-14 Thread Anders Bjornerstedt

- **Priority**: major --> critical
- **Comment**:

Raising severity to critical for this ticket since the symptom is an 
inconsistency in the imm database between nodes.



---

** [tickets:#1472] imm: Default values are assigned to empty-valued attributes 
when sync**

**Status:** review
**Milestone:** 4.5.2
**Created:** Wed Sep 09, 2015 11:01 AM UTC by Hung Nguyen
**Last Updated:** Thu Sep 10, 2015 02:30 PM UTC
**Owner:** Hung Nguyen


root@SC-1:~# immlist -c Test
<< Test - CONFIG >>
test : SA_STRING_T [1] {RDN, CONFIG, INITIALIZED}
attr : SA_INT64_T [0] = 100 (0x64) {CONFIG, WRITEABLE}


On SC-1, set attribute of an object that has default to empty.

root@SC-1:~# immcfg -c Test test=1
root@SC-1:~# immcfg -a attr= test=1

root@SC-1:~# immlist -a attr test=1
attr=


Let another node join the cluster. On that node, list value of the attribute

root@PL-3:~# immlist -a attr test=1
attr=100

There's a mismatch between the nodes.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] Re: #1448 smf: Make campaigns less fragile by retrying on ERR_NO_RESOURCES

2015-08-26 Thread Anders Bjornerstedt

The end user of recent releases i.e. previous releases has not seen this 
problem.
At least no report of the problem has been created until a few weeks ago.
It only occurs in overload situations and has only been seen in recent testing 
with an overloaded system.
New ways of testing are always good.
But testing overload on a system with no load regulation will always find the 
next bottleneck symptom.
WE can play that game indefinitely.
Adding defect upon defect.
Or we can provide some form of load-regulation mechanism for OpenSAF.

It is also ironic that we need to fix this particular overload issue on old 
releases at the same time as we are
Ripping up existing time release plans and suddenly declaring we are going to 
one-track development.,

Personally I am increasingly frustrated by the deterioration in following the 
rules of the ticket system.
Why not just drop the distinction between enhancement and defect ?
No one seems to care (or bother ) about this distinction any more.

The main reason for the distinction (I thought) was to provide an increased 
degree of stability on older
Branches.

New features always means new risk, at least in the short term i.e. first 
release occurrence of a new feature (enhancement).
But no one seems to care about that.

/AndersBj






From: Mathi Naickan [mailto:mathi-naic...@users.sf.net]
Sent: den 25 augusti 2015 16:37
To: opensaf-tickets@lists.sourceforge.net
Subject: [tickets] [opensaf:tickets] #1448 smf: Make campaigns less fragile by 
retrying on ERR_NO_RESOURCES


I think it is more unfair to the end user of recent releases by not passing on 
the benefit by providing an optimization or fix for an issue just because it 
was uncovered/hit late! And especially when the fix does not create any harm 
and only helps in succeeding the campaign. May be in the case of this ticket, 
there is more to help the user and nothing to harm the code path! Also, the 
facts that this is not a newly introduced error code and that IMM API users 
have not met the expectation set upon by IMM, to handle this as TRY_AGAIN calls 
for this to be a defect.



[tickets:#1448]http://sourceforge.net/p/opensaf/tickets/1448/ smf: Make 
campaigns less fragile by retrying on ERR_NO_RESOURCES

Status: unassigned
Milestone: future
Created: Fri Aug 14, 2015 07:09 AM UTC by Anders Bjornerstedt
Last Updated: Tue Aug 25, 2015 11:14 AM UTC
Owner: nobody

The SMF service is a heavy user of the IMM service.
The IMM has an established client pattern for ERR_TRY_AGAIN which allows an 
application realtime
control over how long it is prepared to wait for a transient inability of the 
IMM service to fullfill a request.
Each response of TRY_AGAIN should in itself be fast so the application needs a 
delay in its retry loop.

There is also the very similar error code ERR_NO_RESOURSES. Logically that 
error code is identical
to TRY_AGAIN in that the request could not be accepted due to no fault of the 
client but due to some
more or less temporary problem in the IMM service. The difference is that 
NO_RESOURCES has no
realtime ambitions. Typically this error code is used by the imm when the imm 
can not fullfill a request
due to reasons that are outside of the imm service control. Also the time from 
request to a response
of ERR_NO_RESOUIRCES may be long.

The SMF service in general has no realtime requirments. The main goal for the 
SMF service is to
successfully complete correctly formulated camopaings. This means that the SMF 
service should be
programmed to avoid unnecessary fragility related to temporary problems, even 
if the temporary problem
could linger for seconds or minutes.

The alternative of aborting the campaign will itself discard potentially large 
execution times already
completed. It may sometimes even result in a system restore.

This means that SMF campaigns should have a retry loop that handles not just 
TRY_AGAIN,
but also ERR_NO_RESOURCES where this return code is relevant (can be returned 
according to
the API spec).. The error copde ERR_BUSY also exists and is for all practical 
purposes identical
to ERR_NO_RESOURCES in semantics, both logical and timing.



Sent from sourceforge.net because 
opensaf-tickets@lists.sourceforge.netmailto:opensaf-tickets@lists.sourceforge.net
 is subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a 
mailing list, you can unsubscribe from the mailing list.



---

** [tickets:#1448] smf: Make campaigns less fragile by retrying on 
ERR_NO_RESOURCES**

**Status:** unassigned
**Milestone:** future
**Created:** Fri Aug 14, 2015 07:09 AM UTC by Anders Bjornerstedt
**Last Updated:** Tue Aug 25, 2015 03:03 PM UTC
**Owner:** nobody


The SMF service is a heavy user of the IMM service.
The IMM  has an established client pattern for ERR_TRY_AGAIN

[tickets] [opensaf:tickets] Re: #1448 smf: Make campaigns less fragile by retrying on ERR_NO_RESOURCES

2015-08-26 Thread Anders Bjornerstedt

Yes the principle about handling ERR_NO_RESOURCES should be the same everywhere 
over all SAF services.
Just as the rules for handling TRY_AGAIN should be the same over all OpenSAF 
services.

Any client-application is free to decide to not handle these errors, i.e. to 
stop trying if they get them.
But applications can be made more robust by handling these errors.

There is also ERR_BUSY which for the immsv works exactly the same way as 
ERR_NO_RESOURCES.
SAF created too many error codes as I see it.
There should only be one error code for any particular handling behavior 
defined as appropriate for the error.
If two error codes are to be handled exactly the same then one of the error 
codes should be deprecated.

/AndersBJ


From: Mathi Naickan [mailto:mathi-naic...@users.sf.net]
Sent: den 25 augusti 2015 17:03
To: [opensaf:tickets]
Subject: [opensaf:tickets] #1448 smf: Make campaigns less fragile by retrying 
on ERR_NO_RESOURCES


Just as a note - previously, I had a discussion with Ingvar and he had agreed 
to convert this into a defect.
I can provide a patch for this for OM api calls except for the CCB APIs (based 
on the description above).
Should we also give this treatment for OI APIs?



[tickets:#1448]http://sourceforge.net/p/opensaf/tickets/1448/ smf: Make 
campaigns less fragile by retrying on ERR_NO_RESOURCES

Status: unassigned
Milestone: future
Created: Fri Aug 14, 2015 07:09 AM UTC by Anders Bjornerstedt
Last Updated: Tue Aug 25, 2015 02:37 PM UTC
Owner: nobody

The SMF service is a heavy user of the IMM service.
The IMM has an established client pattern for ERR_TRY_AGAIN which allows an 
application realtime
control over how long it is prepared to wait for a transient inability of the 
IMM service to fullfill a request.
Each response of TRY_AGAIN should in itself be fast so the application needs a 
delay in its retry loop.

There is also the very similar error code ERR_NO_RESOURSES. Logically that 
error code is identical
to TRY_AGAIN in that the request could not be accepted due to no fault of the 
client but due to some
more or less temporary problem in the IMM service. The difference is that 
NO_RESOURCES has no
realtime ambitions. Typically this error code is used by the imm when the imm 
can not fullfill a request
due to reasons that are outside of the imm service control. Also the time from 
request to a response
of ERR_NO_RESOUIRCES may be long.

The SMF service in general has no realtime requirments. The main goal for the 
SMF service is to
successfully complete correctly formulated camopaings. This means that the SMF 
service should be
programmed to avoid unnecessary fragility related to temporary problems, even 
if the temporary problem
could linger for seconds or minutes.

The alternative of aborting the campaign will itself discard potentially large 
execution times already
completed. It may sometimes even result in a system restore.

This means that SMF campaigns should have a retry loop that handles not just 
TRY_AGAIN,
but also ERR_NO_RESOURCES where this return code is relevant (can be returned 
according to
the API spec).. The error copde ERR_BUSY also exists and is for all practical 
purposes identical
to ERR_NO_RESOURCES in semantics, both logical and timing.



Sent from sourceforge.net because you indicated interest in 
https://sourceforge.net/p/opensaf/tickets/1448/

To unsubscribe from further messages, please visit 
https://sourceforge.net/auth/subscriptions/



---

** [tickets:#1448] smf: Make campaigns less fragile by retrying on 
ERR_NO_RESOURCES**

**Status:** unassigned
**Milestone:** future
**Created:** Fri Aug 14, 2015 07:09 AM UTC by Anders Bjornerstedt
**Last Updated:** Tue Aug 25, 2015 03:03 PM UTC
**Owner:** nobody


The SMF service is a heavy user of the IMM service.
The IMM  has an established client pattern for ERR_TRY_AGAIN which allows an 
application realtime
control over how long it is prepared to wait for a transient inability of the 
IMM service to fullfill a request.
Each response of TRY_AGAIN should in itself be fast so the application needs a 
delay in its retry loop.

There is also the very similar error code ERR_NO_RESOURSES.  Logically that 
error code is identical
to TRY_AGAIN in that the request could not be accepted due to no fault of the 
client but due to some
more or less temporary problem in the IMM service. The difference is that 
NO_RESOURCES has no
realtime ambitions. Typically this error code is used by the imm when the imm 
can not fullfill a request
due to reasons that are outside of the imm service control. Also the time from 
request to a response
of ERR_NO_RESOUIRCES may be long. 

The SMF service in general has no realtime requirments. The main goal for the 
SMF service is to
successfully complete correctly formulated camopaings. This means that the SMF 
service should be
programmed to avoid unnecessary fragility related to temporary problems, even

[tickets] [opensaf:tickets] #1458 AMF: Not possible to add/remove configuration for one node in one single CCB

2015-08-25 Thread Anders Bjornerstedt

- **summary**: Not possible to add/remove configuration for one node in one 
single CCB -- AMF: Not possible to add/remove configuration for one node in 
one single CCB



---

** [tickets:#1458] AMF: Not possible to add/remove configuration for one node 
in one single CCB**

**Status:** accepted
**Milestone:** 4.6.1
**Created:** Tue Aug 25, 2015 05:11 AM UTC by Gary Lee
**Last Updated:** Tue Aug 25, 2015 05:24 AM UTC
**Owner:** Gary Lee


When adding a node to scale-out, it is not possible to do that in one CCB.
safAmfNode cannot be created together with the rest of the configuration.

When removing a node, CCBs are needed since the safAmfNode cannot be deleted 
and removed from the safAmfNodeGroups in the same CCB. Delete the safAmfNode 
from safAmfNodeGroups in the same CCB as the rest of the configuration that 
needs to be removed/updated is not possible either.

SCALE_OUT:
two ccb is ok, but problems to create only one ccb:


error - saImmOmCcbApply FAILED: SA_AIS_ERR_FAILED_OPERATION (21)
OI reports: SG or node not configured properly to allow creation of UNLOCKED SU

  2171 16:32:40 06/10/2015 WA safApp=safAmfService Create 
'safSu=dae1f5de53,safSg=NWayActive,safApp=ABC', 
configured with a non existing node (safAmfNode=
PL-6,safAmfCluster=myAmfCluster)
  2172 16:32:40 06/10/2015 NO safApp=safAmfService CCB 575 creation of 
'safSu=dae1f5de53,safSg=NWayActive,safApp=
ABC' failed
  2173 16:32:40 06/10/2015 NO safApp=safAmfService CCB 575 validation 
error: SG or node not configured properly to 
allow creation of UNLOCKED SU
  2174 16:32:42 06/10/2015 WA safApp=safAmfService Create 
'safSu=dae1f5de53,safSg=NWayActive,safApp=ABC', 
configured with a non existing node (safAmfNode=
PL-6,safAmfCluster=myAmfCluster)


NOTE: 'safAmfNode=PL-6,safAmfCluster=myAmfCluster' is created first in the CCB!
ccb.add: immcfg -c SaAmfNode safAmfNode=PL-6,safAmfCluster=myAmfCluster -a 
saAmfNodeSuFailoverMax=2 -a 
saAmfNodeSuFailOverProb=
12000 -a saAmfNodeFailfastOnTerminationFailure=1 -a 
saAmfNodeFailfastOnInstantiationFailure=0 -a 
saAmfNodeClmNode=safNode=PL-6,safCluster=myClmCluster -a saAmfNodeAutoRepair=1 
-a saAmfNodeAdminState=3

SCALE_IN:
Only 3 ccb works.
Two problems below:
1. Scale_in script cannot update node groups event though dependent su is 
deleted in the same CCB:
 
 
 1627 15:04:06 06/10/2015 NO safApp=safAmfService CCB 477 validation error: 
Cannot delete 'safAmfNode=PL-6,
safAmfCluster=myAmfCluster' from 'safAmfNodeGroup=AllN
odes,safAmfCluster=myAmfCluster'. An SU is mapped using node group


2. Remove from nodegroup and delete amfnode in the same ccb does not work.
error - saImmOmCcbApply FAILED: SA_AIS_ERR_FAILED_OPERATION (21)
OI reports: 'safAmfNode=PL-6,safAmfCluster=myAmfCluster' exists in the 
nodegroup 'safAmfNodeGroup=AllNodes,safAmfCluster
=myAmfCluster'


1689 15:18:49 06/10/2015 NO safApp=safAmfService CCB 488 validation error: 
'safAmfNode=PL-6,safAmfCluster=myAmfCluster'
 exists in the nodegroup 'safAmfNodeGroup
=AllNodes,safAmfCluster=myAmfCluster'



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1448 smf: Make campaigns less fragile by retrying on ERR_NO_RESOURCES

2015-08-25 Thread Anders Bjornerstedt

There are at  least three points to make in response to the claim that this has 
to be a defect.

1) We have not seen this problem earlier. So obviously testing is different 
this time, i.e. this
 is new way of testing that was not performed when testing the earlier 
releases.
 
2) Saying that this fix is the only way for this campaign to succeed is not 
true unless you show
that the problem is not performance related. I am convinced that the root 
cause is very much
performance related. So the very same campaign most likely succeeds, 
probably has succeeded
in earlier releases, simply because the platform it was tested on had a 
more reasonable
load/capacity  ratio.

 3) I have noticed that there is lately a tendency to stress test OpenSAF more 
often with higher
 load/capacity ratio, at least here at Ericsson due to various reasons. 
Probably it is relaed to
 the more volatile capacity of virtualized and/or cloud based platforms, 
in particular when
 they are being reconfigured. 
 
What I am basically saying is that it is always possible to increase the 
load/capcity ratio until you
do see a resource related problem ocurr in the system. It is a bit unfair to 
then declare that problem
as a defect. Particuarly when the effect is benign. In this case an SMF 
campaign gets aborted but
in a controlled way.

OpenSAF has no load regulation so OpenSAF is currently vulnerable to getting 
stuck in resource
prroblems. OpensAF does have partial overload protection in the IMM service and 
this is what
is geting triggered here (max outstanding fevs messages at the local IMMND a 
type of flow
control).

On the other hand if this is really a pratical and real problem also for 
deployments on old OpenSAF
releases being used in new ways in *production* , i.e. there is a plan to 
regularly run  with
overloaded capacity in production, then one could declare this as a defect, 
even if it is a bit 
unfair.
   


---

** [tickets:#1448] smf: Make campaigns less fragile by retrying on 
ERR_NO_RESOURCES**

**Status:** unassigned
**Milestone:** future
**Created:** Fri Aug 14, 2015 07:09 AM UTC by Anders Bjornerstedt
**Last Updated:** Tue Aug 25, 2015 09:28 AM UTC
**Owner:** nobody


The SMF service is a heavy user of the IMM service.
The IMM  has an established client pattern for ERR_TRY_AGAIN which allows an 
application realtime
control over how long it is prepared to wait for a transient inability of the 
IMM service to fullfill a request.
Each response of TRY_AGAIN should in itself be fast so the application needs a 
delay in its retry loop.

There is also the very similar error code ERR_NO_RESOURSES.  Logically that 
error code is identical
to TRY_AGAIN in that the request could not be accepted due to no fault of the 
client but due to some
more or less temporary problem in the IMM service. The difference is that 
NO_RESOURCES has no
realtime ambitions. Typically this error code is used by the imm when the imm 
can not fullfill a request
due to reasons that are outside of the imm service control. Also the time from 
request to a response
of ERR_NO_RESOUIRCES may be long. 

The SMF service in general has no realtime requirments. The main goal for the 
SMF service is to
successfully complete correctly formulated camopaings. This means that the SMF 
service should be
programmed to avoid unnecessary fragility related to temporary problems, even 
if the temporary problem
could linger for seconds or minutes. 

The alternative of aborting the campaign will itself discard potentially large 
execution times already
completed. It may sometimes even result in a system restore.

This means that SMF campaigns should have a retry loop that handles not just 
TRY_AGAIN,
but also ERR_NO_RESOURCES where this return code is relevant (can be returned 
according to
the API spec).. The error copde ERR_BUSY also exists and is for all practical 
purposes identical
to ERR_NO_RESOURCES in semantics, both logical and timing.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1448 smf: Make campaigns less fragile by retrying on ERR_NO_RESOURCES

2015-08-25 Thread Anders Bjornerstedt

I should also clarify that there is a distinction between (a) getting 
ER_NO_RESOURCES as the direct
result from an IMM API call  (in the above case a search or accessorGet used by 
SMF); and (b)  determining that a CCB was aborted due to resource error and not 
validation error (new API enhancement #1449). In both cases it means that the 
thing/request was rejected/aborted for resource
reasons. But the handling of retry is different. 

If the user (SMF) directly gets ERR_NO_RESOURCES returned on a call then that 
specifi call can be
retried. 

But if the user (SMF) determines that a CCB has been aborted 
(ERR_FAILED_OPERATION) due to
a resource failure (return value false on argument 'isValidationAbort' for the 
new API 
saImmOmCcbGetAbortReason, then a replay of the whole CCB can be atempted. But 
it makes no
sense here to retry the last ccb related downcall (ccbApply or ccbVAlidate or 
ccbObjectCreate..)
since the CCB has been aborted.

This distinction should be simple because in the resource aborted CCB case you  
dont
actually get SA_AIS_ERR_NO_RESOURCES as a return code.

SMF campaigns robustness can be improved on both aspects, when #1449 has been 
delivered.


---

** [tickets:#1448] smf: Make campaigns less fragile by retrying on 
ERR_NO_RESOURCES**

**Status:** unassigned
**Milestone:** future
**Created:** Fri Aug 14, 2015 07:09 AM UTC by Anders Bjornerstedt
**Last Updated:** Tue Aug 25, 2015 10:56 AM UTC
**Owner:** nobody


The SMF service is a heavy user of the IMM service.
The IMM  has an established client pattern for ERR_TRY_AGAIN which allows an 
application realtime
control over how long it is prepared to wait for a transient inability of the 
IMM service to fullfill a request.
Each response of TRY_AGAIN should in itself be fast so the application needs a 
delay in its retry loop.

There is also the very similar error code ERR_NO_RESOURSES.  Logically that 
error code is identical
to TRY_AGAIN in that the request could not be accepted due to no fault of the 
client but due to some
more or less temporary problem in the IMM service. The difference is that 
NO_RESOURCES has no
realtime ambitions. Typically this error code is used by the imm when the imm 
can not fullfill a request
due to reasons that are outside of the imm service control. Also the time from 
request to a response
of ERR_NO_RESOUIRCES may be long. 

The SMF service in general has no realtime requirments. The main goal for the 
SMF service is to
successfully complete correctly formulated camopaings. This means that the SMF 
service should be
programmed to avoid unnecessary fragility related to temporary problems, even 
if the temporary problem
could linger for seconds or minutes. 

The alternative of aborting the campaign will itself discard potentially large 
execution times already
completed. It may sometimes even result in a system restore.

This means that SMF campaigns should have a retry loop that handles not just 
TRY_AGAIN,
but also ERR_NO_RESOURCES where this return code is relevant (can be returned 
according to
the API spec).. The error copde ERR_BUSY also exists and is for all practical 
purposes identical
to ERR_NO_RESOURCES in semantics, both logical and timing.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1456 imm: IMM object has different attribute value after reloading it from PBE

2015-08-24 Thread Anders Bjornerstedt

- **status**: unassigned -- invalid
- **Milestone**: 4.5.2 -- 4.7-Tentative
- **Comment**:

This behavior is well documented in the immsv README (and should also be so in 
the IMMSV_PR) since OpenSAF4.3. It has worked this way since OpenSAF was 
created.

My opinion is that the real problem is the strange definition of default in SAF 
IMM as
being tied to object-create. A better definition of attribute default value 
would be what
I call strong default. This means that if you have defined a default then it 
is impossible to assign the empty value (NULL) to that atribute. An attempt to 
do so will result in the imm
replacing it with the default value. That is a clean and mor normal definition 
of a default.
See ticket #1425.

In addition, no one has complained about this that I am aware of and some could 
potentially even depend on this behavior.  An application that wants to assign 
the NULL/empty value to an attribute should not use the current default value 
mechanism. From the README: 

Common missunderstandings about attribute defaults.
---
Imm class definitions allow the declaration of a default value to be defined
as part of an attribute definition.

(i) A default declaration is only allowed for single valued attributes (no
concept of a multivalued default exists).

(ii) Default values are assigned at object creation. Default values are NOT
assigned if an attribute is set to the empty/null value by a modification.

(iii) Default values are assigned at cluster restart for any attributes that
are null/empty and that have a default. This is a special case of (i) because
imm loading actually uses the regular imm API to recreate the imm contents.
In particular, saImmOmCcbObjectCreate is used to recreate all objects from
the file-system image.






---

** [tickets:#1456] imm: IMM object has different attribute value after 
reloading it from PBE**

**Status:** invalid
**Milestone:** 4.7-Tentative
**Created:** Fri Aug 21, 2015 01:05 PM UTC by Anders Widell
**Last Updated:** Fri Aug 21, 2015 01:05 PM UTC
**Owner:** nobody


There is a scenario when an IMM object can be different after reloading it from 
PBE, compared to what it looked like when it was saved to PBE. This happens 
when an attribute has a default value in the class definition, but the 
attribute value has been deleted (been set to NULL) in the object. When 
reloading the object from PBE, the attribute will again be set to the default 
value.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1445 imm: Don't check for pending fevs when only updating pure runtime attributes

2015-08-21 Thread Anders Bjornerstedt

- **summary**: imm: Don't check for pending fevs when updating pure runtime 
attributes -- imm: Don't check for pending fevs when only updating pure 
runtime attributes
- **Comment**:

The optimization is only for the pure and local case.
This should typically be the case for a pure RTA update, which should only be 
the result
of a read request from some client. 
However, the saImmOiRtObjectUpdate API actually leaves it open to the OI to 
allow
it to update both pure RTAs and cached RTAs in the same call. 
Probably no one has tested that mixed variant since it has no clear use-case.



---

** [tickets:#1445] imm: Don't check for pending fevs when only updating pure 
runtime attributes**

**Status:** review
**Milestone:** 4.7-Tentative
**Created:** Thu Aug 13, 2015 09:48 AM UTC by Hung Nguyen
**Last Updated:** Fri Aug 14, 2015 06:21 AM UTC
**Owner:** Neelakanta Reddy


When invoking saImmOiRtObjectUpdate(), number of pending fevs messages is 
always checked on server side and TRY_AGAIN is returned when it reaches 
IMMSV_DEFAULT_FEVS_MAX_PENDING.

If the attributes to be updated are pure runtime attributes, number of pending 
fevs messages should not be checked because the IMMD_EVT_ND2D_OI_OBJ_MODIFY 
message wouldn't be sent out to broadcast to other IMMNDs.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1449 IMM: CCB interface for probing abort reason (validation error or resource error)

2015-08-20 Thread Anders Bjornerstedt

- **status**: unassigned -- assigned
- **assigned_to**: Zoran Milinkovic



---

** [tickets:#1449] IMM: CCB interface for probing abort reason (validation 
error or resource error)**

**Status:** assigned
**Milestone:** 4.7-Tentative
**Created:** Fri Aug 14, 2015 08:33 AM UTC by Anders Bjornerstedt
**Last Updated:** Fri Aug 14, 2015 08:34 AM UTC
**Owner:** Zoran Milinkovic


Suggested interface, closely related to saImmOmCcbGetErrorStrings():

extern SaAisErrorT  saImmOmCcbGetAbortReason(SaImmCcbHandleT ccbHandle,
   
SaBoolT* isValidationAbort);

Arguments :  ccbHandle (in)-The ccb handle.
isValidationAbort (out)  - SA_TRUE if validation 
abort otherwise resource abort.
Return Values :  
   SA_AIS_OK
   SA_AIS_ERR_BAD_HANDLE   - bad ccb handle.
   SA_AIS_ERR_INVALID_PARAM - handle is associated with ccb 
that is NOT aborted.
   SA_AIS_ERR_VERSION  (not using A.2.xx)


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1291 IMM: IMMD healthcheck callback timeout when standby controller rebooted in middle of IMMND sync

2015-08-19 Thread Anders Bjornerstedt

- **Component**: imm -- mds
- **Comment**:

The IMMD is blocked on the (asyncronous) broadcast of *one* fevs message for 
more than 3 minutes. Changing component to MDS.

A reelvant question is what is otherwise special about this test. 
Is MDS TCP used and not TIPC  (MDS broadcast uses TIPC multicast whic his 
faster).

Clearly something is over/under dimensioned in this system.
This test condifuration probably needs special configuration for MDS and or IMM 
(max sync bnatch size).

Again I dont see that this is a defect on IMM.



---

** [tickets:#1291] IMM: IMMD healthcheck callback timeout when standby 
controller rebooted in middle of IMMND sync**

**Status:** unassigned
**Milestone:** 4.5.2
**Created:** Mon Mar 30, 2015 07:21 AM UTC by Sirisha Alla
**Last Updated:** Wed Aug 19, 2015 09:27 AM UTC
**Owner:** Neelakanta Reddy
**Attachments:**

- 
[immlogs.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1291/attachment/immlogs.tar.bz2)
 (6.8 MB; application/x-bzip)


The issue is observed with 4.6 FC changeset 6377. The system is up and running 
with single pbe and 50k objects. This issue is seen after 
http://sourceforge.net/p/opensaf/tickets/1290 is observed. IMM application is 
running on standby controller and immcfg command is run from payload to set 
CompRestartMax value to 1000. IMMND is killed twice on standby controller 
leading to #1290.

As a result, standby controller left the cluster in middle of sync, IMMD 
reported healthcheck callback timeout and the active controller too went for 
reboot. Following is the syslog of SC-1:

Mar 26 14:58:17 SLES-64BIT-SLOT1 osafimmloadd: NO Sync starting
Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: NO Node Down event for node id 
2020f:
Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: NO Current role: ACTIVE
Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: Rebooting OpenSAF NodeId = 
131599 EE Name = , Reason: Received Node Down for peer controller, OwnNodeId = 
131343, SupervisionTime = 60
Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.412080] TIPC: Resetting link 
1.1.1:eth0-1.1.2:eth0, peer not responding
Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.412089] TIPC: Lost link 
1.1.1:eth0-1.1.2:eth0 on network plane A
Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.413191] TIPC: Lost contact with 
1.1.2
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not 
sending track callback for agents on that node
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not 
sending track callback for agents on that node
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not 
sending track callback for agents on that node
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not 
sending track callback for agents on that node
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not 
sending track callback for agents on that node
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not 
sending track callback for agents on that node
Mar 26 14:58:30 SLES-64BIT-SLOT1 osafamfd[9628]: NO Node 'SC-2' left the cluster
Mar 26 14:58:30 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting remote node in the 
absence of PLM is outside the scope of OpenSAF
Mar 26 14:58:54 SLES-64BIT-SLOT1 kernel: [15226.674333] TIPC: Established link 
1.1.1:eth0-1.1.2:eth0 on network plane A
Mar 26 15:00:02 SLES-64BIT-SLOT1 syslog-ng[3261]: Log statistics; 
dropped='pipe(/dev/xconsole)=0', dropped='pipe(/dev/tty10)=0', 
processed='center(queued)=2197', processed='center(received)=1172', 
processed='destination(messages)=1172', processed='destination(mailinfo)=0', 
processed='destination(mailwarn)=0', 
processed='destination(localmessages)=955', processed='destination(newserr)=0', 
processed='destination(mailerr)=0', processed='destination(netmgm)=0', 
processed='destination(warn)=44', processed='destination(console)=13', 
processed='destination(null)=0', processed='destination(mail)=0', 
processed='destination(xconsole)=13', processed='destination(firewall)=0', 
processed='destination(acpid)=0', processed='destination(newscrit)=0', 
processed='destination(newsnotice)=0', processed='source(src)=1172'
Mar 26 15:00:07 SLES-64BIT-SLOT1 osafimmloadd: ER Too many TRY_AGAIN on 
saImmOmSearchNext - aborting
Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: ER SYNC APPARENTLY FAILED 
status:1
Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO -SERVER STATE: 
IMM_SERVER_SYNC_SERVER -- IMM_SERVER_READY
Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO NODE STATE- 
IMM_NODE_FULLY_AVAILABLE (2484)
Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO Epoch set to 12 in ImmModel
Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO Coord broadcasting 
ABORT_SYNC, epoch:12
Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmpbed: NO Update epoch 12 committing 
with ccbId:10054/4294967380
Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO SU failover probation 
timer started (timeout:

[tickets] [opensaf:tickets] Re: #1291 IMM: IMMD healthcheck callback timeout when standby controller rebooted in middle of IMMND sync

2015-08-19 Thread Anders Bjornerstedt

Ok but then the question simply becomes why does the healthcheck callback not 
reach the IMMND or why does the IMMND reply
not reach the AMFND ?

/AndersBj



From: Sirisha Alla [mailto:al...@users.sf.net]
Sent: den 19 augusti 2015 10:50
To: [opensaf:tickets]
Subject: [opensaf:tickets] #1291 IMM: IMMD healthcheck callback timeout when 
standby controller rebooted in middle of IMMND sync


This issue is reproduced on changeset 6744. Syslog as follows:

Aug 19 11:54:13 SLES-64BIT-SLOT1 osafimmnd[5969]: NO implementer for class 
'SaSmfSwBundle' is safSmfService = class extent is safe.
Aug 19 11:54:13 SLES-64BIT-SLOT1 osafamfnd[6054]: NO Assigned 
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
Aug 19 11:54:13 SLES-64BIT-SLOT1 opensafd: OpenSAF(4.7.M0 - ) services 
successfully started
Aug 19 11:54:14 SLES-64BIT-SLOT1 osafimmd[5958]: NO Successfully announced dump 
at node 2010f. New Epoch:27
..
Aug 19 12:00:12 SLES-64BIT-SLOT1 kernel: [ 4223.945761] TIPC: Established link 
1.1.1:eth0-1.1.2:eth0 on network plane A
Aug 19 12:00:13 SLES-64BIT-SLOT1 osafimmd[5958]: NO New IMMND process is on 
STANDBY Controller at 2020f
Aug 19 12:00:13 SLES-64BIT-SLOT1 osafimmd[5958]: NO Extended intro from node 
2020f
Aug 19 12:00:13 SLES-64BIT-SLOT1 osafimmd[5958]: WA IMMND on controller (not 
currently coord) requests sync
Aug 19 12:00:13 SLES-64BIT-SLOT1 osafimmd[5958]: NO Node 2020f request sync 
sync-pid:5221 epoch:0
Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Announce sync, epoch:30
Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmnd[5969]: NO SERVER STATE: 
IMM_SERVER_READY -- IMM_SERVER_SYNC_SERVER
Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmnd[5969]: NO NODE STATE- 
IMM_NODE_R_AVAILABLE
Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmd[5958]: NO Successfully announced 
sync. New ruling epoch:30
Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmloadd: logtrace: trace enabled to file 
/var/log/opensaf/osafimmnd, mask=0x
Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmloadd: NO Sync starting
Aug 19 12:00:15 SLES-64BIT-SLOT1 osafamfd[6044]: NO Node 'PL-3' left the cluster
Aug 19 12:00:15 SLES-64BIT-SLOT1 osafclmd[6025]: NO Node 131855 went down. Not 
sending track callback for agents on that node
Aug 19 12:00:15 SLES-64BIT-SLOT1 osafclmd[6025]: NO Node 131855 went down. Not 
sending track callback for agents on that node
Aug 19 12:00:15 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Global discard node 
received for nodeId:2030f pid:16584
Aug 19 12:00:15 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Implementer disconnected 
15 0, 2030f(down) (MsgQueueService131855)
Aug 19 12:00:20 SLES-64BIT-SLOT1 kernel: [ 4231.876089] TIPC: Resetting link 
1.1.1:eth0-1.1.3:eth0, peer not responding
Aug 19 12:00:20 SLES-64BIT-SLOT1 kernel: [ 4231.876098] TIPC: Lost link 
1.1.1:eth0-1.1.3:eth0 on network plane A
Aug 19 12:00:20 SLES-64BIT-SLOT1 kernel: [ 4231.877196] TIPC: Lost contact with 
1.1.3
Aug 19 12:00:46 SLES-64BIT-SLOT1 kernel: [ 4257.206593] TIPC: Established link 
1.1.1:eth0-1.1.3:eth0 on network plane A
Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmloadd: ER Too many TRY_AGAIN on 
saImmOmSearchNext - aborting
Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: ER SYNC APPARENTLY FAILED 
status:1
Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: NO -SERVER STATE: 
IMM_SERVER_SYNC_SERVER -- IMM_SERVER_READY
Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: NO NODE STATE- 
IMM_NODE_FULLY_AVAILABLE (2484)
Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Epoch set to 30 in ImmModel
Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Coord broadcasting 
ABORT_SYNC, epoch:30
Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmpbed: NO Update epoch 30 committing 
with ccbId:10006/4294967302
Aug 19 12:03:50 SLES-64BIT-SLOT1 kernel: [ 4441.964128] TIPC: Resetting link 
1.1.1:eth0-1.1.3:eth0, peer not responding
Aug 19 12:03:50 SLES-64BIT-SLOT1 kernel: [ 4441.964145] TIPC: Lost link 
1.1.1:eth0-1.1.3:eth0 on network plane A
Aug 19 12:03:50 SLES-64BIT-SLOT1 kernel: [ 4441.964157] TIPC: Lost contact with 
1.1.3
Aug 19 12:04:28 SLES-64BIT-SLOT1 osafimmnd[5969]: WA PBE process 5994 appears 
stuck on runtime data handling - sending SIGTERM
Aug 19 12:04:28 SLES-64BIT-SLOT1 osafimmpbed: NO IMM PBE received SIG_TERM, 
closing db handle
Aug 19 12:04:28 SLES-64BIT-SLOT1 osafimmpbed: IN IMM PBE process EXITING...
Aug 19 12:04:28 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Implementer locally 
disconnected. Marking it as doomed 11 316, 2010f (OpenSafImmPBE)
Aug 19 12:04:29 SLES-64BIT-SLOT1 osafimmnd[5969]: WA Persistent back-end 
process has apparently died.
Aug 19 12:04:29 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Coord broadcasting 
PBE_PRTO_PURGE_MUTATIONS, epoch:30
Aug 19 12:04:29 SLES-64BIT-SLOT1 osafimmnd[5969]: NO ImmModel::getPbeOi reports 
missing PbeOi locally = unsafe
Aug 19 12:04:29 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Coord broadcasting 
PBE_PRTO_PURGE_MUTATIONS, epoch:30
Aug 19 12:04:30 SLES-64BIT-SLOT1 osafimmnd[5969]: NO ImmModel::getPbeOi reports 
missing PbeOi

[tickets] [opensaf:tickets] #1291 IMM: IMMD healthcheck callback timeout when standby controller rebooted in middle of IMMND sync

2015-08-19 Thread Anders Bjornerstedt

The IMMD sends one fevs message at a time for each poll cycle.
Also in each poll cycle it checks the AMF descriptor for healthcheck callbacks.

This means that the IMMD is blocked for more than 3 minutes on broadcasting one 
fevs message.

The IMMSV_DEFAULT_FEVS_MAX_PENDING) affects the IMMND process, not the IMMD.

The FEVS_MAX_PENDING is there precisely not to overload the IMMD. 


---

** [tickets:#1291] IMM: IMMD healthcheck callback timeout when standby 
controller rebooted in middle of IMMND sync**

**Status:** unassigned
**Milestone:** 4.5.2
**Created:** Mon Mar 30, 2015 07:21 AM UTC by Sirisha Alla
**Last Updated:** Wed Aug 19, 2015 09:24 AM UTC
**Owner:** Neelakanta Reddy
**Attachments:**

- 
[immlogs.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1291/attachment/immlogs.tar.bz2)
 (6.8 MB; application/x-bzip)


The issue is observed with 4.6 FC changeset 6377. The system is up and running 
with single pbe and 50k objects. This issue is seen after 
http://sourceforge.net/p/opensaf/tickets/1290 is observed. IMM application is 
running on standby controller and immcfg command is run from payload to set 
CompRestartMax value to 1000. IMMND is killed twice on standby controller 
leading to #1290.

As a result, standby controller left the cluster in middle of sync, IMMD 
reported healthcheck callback timeout and the active controller too went for 
reboot. Following is the syslog of SC-1:

Mar 26 14:58:17 SLES-64BIT-SLOT1 osafimmloadd: NO Sync starting
Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: NO Node Down event for node id 
2020f:
Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: NO Current role: ACTIVE
Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: Rebooting OpenSAF NodeId = 
131599 EE Name = , Reason: Received Node Down for peer controller, OwnNodeId = 
131343, SupervisionTime = 60
Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.412080] TIPC: Resetting link 
1.1.1:eth0-1.1.2:eth0, peer not responding
Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.412089] TIPC: Lost link 
1.1.1:eth0-1.1.2:eth0 on network plane A
Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.413191] TIPC: Lost contact with 
1.1.2
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not 
sending track callback for agents on that node
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not 
sending track callback for agents on that node
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not 
sending track callback for agents on that node
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not 
sending track callback for agents on that node
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not 
sending track callback for agents on that node
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not 
sending track callback for agents on that node
Mar 26 14:58:30 SLES-64BIT-SLOT1 osafamfd[9628]: NO Node 'SC-2' left the cluster
Mar 26 14:58:30 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting remote node in the 
absence of PLM is outside the scope of OpenSAF
Mar 26 14:58:54 SLES-64BIT-SLOT1 kernel: [15226.674333] TIPC: Established link 
1.1.1:eth0-1.1.2:eth0 on network plane A
Mar 26 15:00:02 SLES-64BIT-SLOT1 syslog-ng[3261]: Log statistics; 
dropped='pipe(/dev/xconsole)=0', dropped='pipe(/dev/tty10)=0', 
processed='center(queued)=2197', processed='center(received)=1172', 
processed='destination(messages)=1172', processed='destination(mailinfo)=0', 
processed='destination(mailwarn)=0', 
processed='destination(localmessages)=955', processed='destination(newserr)=0', 
processed='destination(mailerr)=0', processed='destination(netmgm)=0', 
processed='destination(warn)=44', processed='destination(console)=13', 
processed='destination(null)=0', processed='destination(mail)=0', 
processed='destination(xconsole)=13', processed='destination(firewall)=0', 
processed='destination(acpid)=0', processed='destination(newscrit)=0', 
processed='destination(newsnotice)=0', processed='source(src)=1172'
Mar 26 15:00:07 SLES-64BIT-SLOT1 osafimmloadd: ER Too many TRY_AGAIN on 
saImmOmSearchNext - aborting
Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: ER SYNC APPARENTLY FAILED 
status:1
Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO -SERVER STATE: 
IMM_SERVER_SYNC_SERVER -- IMM_SERVER_READY
Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO NODE STATE- 
IMM_NODE_FULLY_AVAILABLE (2484)
Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO Epoch set to 12 in ImmModel
Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO Coord broadcasting 
ABORT_SYNC, epoch:12
Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmpbed: NO Update epoch 12 committing 
with ccbId:10054/4294967380
Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO SU failover probation 
timer started (timeout: 12000 ns)
Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO Performing failover of 
'safSu=SC-1,safSg=2N,safApp=OpenSAF' (SU failover count:

[tickets] [opensaf:tickets] Re: #1291 IMM: IMMD healthcheck callback timeout when standby controller rebooted in middle of IMMND sync

2015-08-19 Thread Anders Bjornerstedt

Changeset 6744  is generated today.
So I assume this means you reproduced this today.

The IMMND main poll handling processes in sequence on each descriptor, so it 
should not be possible
For traffic on one descriptor to starve out a job on another.

/AndersBj






From: Anders Bjornerstedt [mailto:ander...@users.sf.net]
Sent: den 19 augusti 2015 10:54
To: [opensaf:tickets]
Subject: [opensaf:tickets] Re: #1291 IMM: IMMD healthcheck callback timeout 
when standby controller rebooted in middle of IMMND sync


Ok but then the question simply becomes why does the healthcheck callback not 
reach the IMMND or why does the IMMND reply
not reach the AMFND ?

/AndersBj

From: Sirisha Alla [mailto:al...@users.sf.net]
Sent: den 19 augusti 2015 10:50
To: [opensaf:tickets]
Subject: [opensaf:tickets] #1291 IMM: IMMD healthcheck callback timeout when 
standby controller rebooted in middle of IMMND sync

This issue is reproduced on changeset 6744. Syslog as follows:

Aug 19 11:54:13 SLES-64BIT-SLOT1 osafimmnd[5969]: NO implementer for class 
'SaSmfSwBundle' is safSmfService = class extent is safe.
Aug 19 11:54:13 SLES-64BIT-SLOT1 osafamfnd[6054]: NO Assigned 
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
Aug 19 11:54:13 SLES-64BIT-SLOT1 opensafd: OpenSAF(4.7.M0 - ) services 
successfully started
Aug 19 11:54:14 SLES-64BIT-SLOT1 osafimmd[5958]: NO Successfully announced dump 
at node 2010f. New Epoch:27
..
Aug 19 12:00:12 SLES-64BIT-SLOT1 kernel: [ 4223.945761] TIPC: Established link 
1.1.1:eth0-1.1.2:eth0 on network plane A
Aug 19 12:00:13 SLES-64BIT-SLOT1 osafimmd[5958]: NO New IMMND process is on 
STANDBY Controller at 2020f
Aug 19 12:00:13 SLES-64BIT-SLOT1 osafimmd[5958]: NO Extended intro from node 
2020f
Aug 19 12:00:13 SLES-64BIT-SLOT1 osafimmd[5958]: WA IMMND on controller (not 
currently coord) requests sync
Aug 19 12:00:13 SLES-64BIT-SLOT1 osafimmd[5958]: NO Node 2020f request sync 
sync-pid:5221 epoch:0
Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Announce sync, epoch:30
Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmnd[5969]: NO SERVER STATE: 
IMM_SERVER_READY -- IMM_SERVER_SYNC_SERVER
Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmnd[5969]: NO NODE STATE- 
IMM_NODE_R_AVAILABLE
Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmd[5958]: NO Successfully announced 
sync. New ruling epoch:30
Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmloadd: logtrace: trace enabled to file 
/var/log/opensaf/osafimmnd, mask=0x
Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmloadd: NO Sync starting
Aug 19 12:00:15 SLES-64BIT-SLOT1 osafamfd[6044]: NO Node 'PL-3' left the cluster
Aug 19 12:00:15 SLES-64BIT-SLOT1 osafclmd[6025]: NO Node 131855 went down. Not 
sending track callback for agents on that node
Aug 19 12:00:15 SLES-64BIT-SLOT1 osafclmd[6025]: NO Node 131855 went down. Not 
sending track callback for agents on that node
Aug 19 12:00:15 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Global discard node 
received for nodeId:2030f pid:16584
Aug 19 12:00:15 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Implementer disconnected 
15 0, 2030f(down) (MsgQueueService131855)
Aug 19 12:00:20 SLES-64BIT-SLOT1 kernel: [ 4231.876089] TIPC: Resetting link 
1.1.1:eth0-1.1.3:eth0, peer not responding
Aug 19 12:00:20 SLES-64BIT-SLOT1 kernel: [ 4231.876098] TIPC: Lost link 
1.1.1:eth0-1.1.3:eth0 on network plane A
Aug 19 12:00:20 SLES-64BIT-SLOT1 kernel: [ 4231.877196] TIPC: Lost contact with 
1.1.3
Aug 19 12:00:46 SLES-64BIT-SLOT1 kernel: [ 4257.206593] TIPC: Established link 
1.1.1:eth0-1.1.3:eth0 on network plane A
Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmloadd: ER Too many TRY_AGAIN on 
saImmOmSearchNext - aborting
Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: ER SYNC APPARENTLY FAILED 
status:1
Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: NO -SERVER STATE: 
IMM_SERVER_SYNC_SERVER -- IMM_SERVER_READY
Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: NO NODE STATE- 
IMM_NODE_FULLY_AVAILABLE (2484)
Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Epoch set to 30 in ImmModel
Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Coord broadcasting 
ABORT_SYNC, epoch:30
Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmpbed: NO Update epoch 30 committing 
with ccbId:10006/4294967302
Aug 19 12:03:50 SLES-64BIT-SLOT1 kernel: [ 4441.964128] TIPC: Resetting link 
1.1.1:eth0-1.1.3:eth0, peer not responding
Aug 19 12:03:50 SLES-64BIT-SLOT1 kernel: [ 4441.964145] TIPC: Lost link 
1.1.1:eth0-1.1.3:eth0 on network plane A
Aug 19 12:03:50 SLES-64BIT-SLOT1 kernel: [ 4441.964157] TIPC: Lost contact with 
1.1.3
Aug 19 12:04:28 SLES-64BIT-SLOT1 osafimmnd[5969]: WA PBE process 5994 appears 
stuck on runtime data handling - sending SIGTERM
Aug 19 12:04:28 SLES-64BIT-SLOT1 osafimmpbed: NO IMM PBE received SIG_TERM, 
closing db handle
Aug 19 12:04:28 SLES-64BIT-SLOT1 osafimmpbed: IN IMM PBE process EXITING...
Aug 19 12:04:28 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Implementer locally 
disconnected. Marking it as doomed 11 316, 2010f (OpenSafImmPBE)
Aug 19 12:04:29

[tickets] [opensaf:tickets] Re: #1291 IMM: IMMD healthcheck callback timeout when standby controller rebooted in middle of IMMND sync

2015-08-19 Thread Anders Bjornerstedt

Please reproduce withe IMMD trace on.

/AndersBj

From: Sirisha Alla [mailto:sirisha.a...@oracle.com]
Sent: den 19 augusti 2015 11:07
To: [opensaf:tickets]; opensaf-tickets@lists.sourceforge.net
Subject: Re: [tickets] [opensaf:tickets] Re: #1291 IMM: IMMD healthcheck 
callback timeout when standby controller rebooted in middle of IMMND sync

Yes, I tried this today. The healthcheck timeout happened on IMMD not on IMMND.

/Sirisha

On Wednesday 19 August 2015 02:28 PM, Anders Bjornerstedt wrote:

Changeset 6744 is generated today.
So I assume this means you reproduced this today.

The IMMND main poll handling processes in sequence on each descriptor, so it 
should not be possible
For traffic on one descriptor to starve out a job on another.

/AndersBj

From: Anders Bjornerstedt [mailto:ander...@users.sf.net]
Sent: den 19 augusti 2015 10:54
To: [opensaf:tickets]
Subject: [opensaf:tickets] Re: #1291 IMM: IMMD healthcheck callback timeout 
when standby controller rebooted in middle of IMMND sync

Ok but then the question simply becomes why does the healthcheck callback not 
reach the IMMND or why does the IMMND reply
not reach the AMFND ?

/AndersBj

From: Sirisha Alla [mailto:al...@users.sf.net]
Sent: den 19 augusti 2015 10:50
To: [opensaf:tickets]
Subject: [opensaf:tickets] #1291 IMM: IMMD healthcheck callback timeout when 
standby controller rebooted in middle of IMMND sync

This issue is reproduced on changeset 6744. Syslog as follows:

Aug 19 11:54:13 SLES-64BIT-SLOT1 osafimmnd[5969]: NO implementer for class 
'SaSmfSwBundle' is safSmfService = class extent is safe.
Aug 19 11:54:13 SLES-64BIT-SLOT1 osafamfnd[6054]: NO Assigned 
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
Aug 19 11:54:13 SLES-64BIT-SLOT1 opensafd: OpenSAF(4.7.M0 - ) services 
successfully started
Aug 19 11:54:14 SLES-64BIT-SLOT1 osafimmd[5958]: NO Successfully announced dump 
at node 2010f. New Epoch:27
..
Aug 19 12:00:12 SLES-64BIT-SLOT1 kernel: [ 4223.945761] TIPC: Established link 
1.1.1:eth0-1.1.2:eth0 on network plane A
Aug 19 12:00:13 SLES-64BIT-SLOT1 osafimmd[5958]: NO New IMMND process is on 
STANDBY Controller at 2020f
Aug 19 12:00:13 SLES-64BIT-SLOT1 osafimmd[5958]: NO Extended intro from node 
2020f
Aug 19 12:00:13 SLES-64BIT-SLOT1 osafimmd[5958]: WA IMMND on controller (not 
currently coord) requests sync
Aug 19 12:00:13 SLES-64BIT-SLOT1 osafimmd[5958]: NO Node 2020f request sync 
sync-pid:5221 epoch:0
Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Announce sync, epoch:30
Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmnd[5969]: NO SERVER STATE: 
IMM_SERVER_READY -- IMM_SERVER_SYNC_SERVER
Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmnd[5969]: NO NODE STATE- 
IMM_NODE_R_AVAILABLE
Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmd[5958]: NO Successfully announced 
sync. New ruling epoch:30
Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmloadd: logtrace: trace enabled to file 
/var/log/opensaf/osafimmnd, mask=0x
Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmloadd: NO Sync starting
Aug 19 12:00:15 SLES-64BIT-SLOT1 osafamfd[6044]: NO Node 'PL-3' left the cluster
Aug 19 12:00:15 SLES-64BIT-SLOT1 osafclmd[6025]: NO Node 131855 went down. Not 
sending track callback for agents on that node
Aug 19 12:00:15 SLES-64BIT-SLOT1 osafclmd[6025]: NO Node 131855 went down. Not 
sending track callback for agents on that node
Aug 19 12:00:15 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Global discard node 
received for nodeId:2030f pid:16584
Aug 19 12:00:15 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Implementer disconnected 
15 0, 2030f(down) (MsgQueueService131855)
Aug 19 12:00:20 SLES-64BIT-SLOT1 kernel: [ 4231.876089] TIPC: Resetting link 
1.1.1:eth0-1.1.3:eth0, peer not responding
Aug 19 12:00:20 SLES-64BIT-SLOT1 kernel: [ 4231.876098] TIPC: Lost link 
1.1.1:eth0-1.1.3:eth0 on network plane A
Aug 19 12:00:20 SLES-64BIT-SLOT1 kernel: [ 4231.877196] TIPC: Lost contact with 
1.1.3
Aug 19 12:00:46 SLES-64BIT-SLOT1 kernel: [ 4257.206593] TIPC: Established link 
1.1.1:eth0-1.1.3:eth0 on network plane A
Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmloadd: ER Too many TRY_AGAIN on 
saImmOmSearchNext - aborting
Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: ER SYNC APPARENTLY FAILED 
status:1
Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: NO -SERVER STATE: 
IMM_SERVER_SYNC_SERVER -- IMM_SERVER_READY
Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: NO NODE STATE- 
IMM_NODE_FULLY_AVAILABLE (2484)
Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Epoch set to 30 in ImmModel
Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Coord broadcasting 
ABORT_SYNC, epoch:30
Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmpbed: NO Update epoch 30 committing 
with ccbId:10006/4294967302
Aug 19 12:03:50 SLES-64BIT-SLOT1 kernel: [ 4441.964128] TIPC: Resetting link 
1.1.1:eth0-1.1.3:eth0, peer not responding
Aug 19 12:03:50 SLES-64BIT-SLOT1 kernel: [ 4441.964145] TIPC: Lost link 
1.1.1:eth0-1.1.3:eth0 on network plane A
Aug 19 12:03:50 SLES-64BIT-SLOT1 kernel

[tickets] [opensaf:tickets] #1291 IMM: IMMD healthcheck callback timeout when standby controller rebooted in middle of IMMND sync

2015-08-18 Thread Anders Bjornerstedt

- **Comment**:

I propose that we increase the saAmfHctDefMaxDuration value from 3 minutes to 
5 minutes in:

safHealthcheckKey=Default,safVersion=4.0.0,safCompType=OpenSafCompTypeIMMND

This is the only thing that can be done in the IMMSv.

The other alternatives are:
(1)
Place the ticket on MDS (since IMMND is could only be blocked on MDS 
asynchronous send).
Yes asynchronous send. It myst be bocked in the MDS library processing of 
packing a huge
message /(sync buffer ?) for asynchronous send. Stuck on MDS for 3 minutes.

(2) Clöose the ticket.




---

** [tickets:#1291] IMM: IMMD healthcheck callback timeout when standby 
controller rebooted in middle of IMMND sync**

**Status:** assigned
**Milestone:** 4.5.2
**Created:** Mon Mar 30, 2015 07:21 AM UTC by Sirisha Alla
**Last Updated:** Fri Aug 14, 2015 07:45 AM UTC
**Owner:** Neelakanta Reddy
**Attachments:**

- 
[immlogs.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1291/attachment/immlogs.tar.bz2)
 (6.8 MB; application/x-bzip)


The issue is observed with 4.6 FC changeset 6377. The system is up and running 
with single pbe and 50k objects. This issue is seen after 
http://sourceforge.net/p/opensaf/tickets/1290 is observed. IMM application is 
running on standby controller and immcfg command is run from payload to set 
CompRestartMax value to 1000. IMMND is killed twice on standby controller 
leading to #1290.

As a result, standby controller left the cluster in middle of sync, IMMD 
reported healthcheck callback timeout and the active controller too went for 
reboot. Following is the syslog of SC-1:

Mar 26 14:58:17 SLES-64BIT-SLOT1 osafimmloadd: NO Sync starting
Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: NO Node Down event for node id 
2020f:
Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: NO Current role: ACTIVE
Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: Rebooting OpenSAF NodeId = 
131599 EE Name = , Reason: Received Node Down for peer controller, OwnNodeId = 
131343, SupervisionTime = 60
Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.412080] TIPC: Resetting link 
1.1.1:eth0-1.1.2:eth0, peer not responding
Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.412089] TIPC: Lost link 
1.1.1:eth0-1.1.2:eth0 on network plane A
Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.413191] TIPC: Lost contact with 
1.1.2
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not 
sending track callback for agents on that node
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not 
sending track callback for agents on that node
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not 
sending track callback for agents on that node
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not 
sending track callback for agents on that node
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not 
sending track callback for agents on that node
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not 
sending track callback for agents on that node
Mar 26 14:58:30 SLES-64BIT-SLOT1 osafamfd[9628]: NO Node 'SC-2' left the cluster
Mar 26 14:58:30 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting remote node in the 
absence of PLM is outside the scope of OpenSAF
Mar 26 14:58:54 SLES-64BIT-SLOT1 kernel: [15226.674333] TIPC: Established link 
1.1.1:eth0-1.1.2:eth0 on network plane A
Mar 26 15:00:02 SLES-64BIT-SLOT1 syslog-ng[3261]: Log statistics; 
dropped='pipe(/dev/xconsole)=0', dropped='pipe(/dev/tty10)=0', 
processed='center(queued)=2197', processed='center(received)=1172', 
processed='destination(messages)=1172', processed='destination(mailinfo)=0', 
processed='destination(mailwarn)=0', 
processed='destination(localmessages)=955', processed='destination(newserr)=0', 
processed='destination(mailerr)=0', processed='destination(netmgm)=0', 
processed='destination(warn)=44', processed='destination(console)=13', 
processed='destination(null)=0', processed='destination(mail)=0', 
processed='destination(xconsole)=13', processed='destination(firewall)=0', 
processed='destination(acpid)=0', processed='destination(newscrit)=0', 
processed='destination(newsnotice)=0', processed='source(src)=1172'
Mar 26 15:00:07 SLES-64BIT-SLOT1 osafimmloadd: ER Too many TRY_AGAIN on 
saImmOmSearchNext - aborting
Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: ER SYNC APPARENTLY FAILED 
status:1
Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO -SERVER STATE: 
IMM_SERVER_SYNC_SERVER -- IMM_SERVER_READY
Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO NODE STATE- 
IMM_NODE_FULLY_AVAILABLE (2484)
Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO Epoch set to 12 in ImmModel
Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO Coord broadcasting 
ABORT_SYNC, epoch:12
Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmpbed: NO Update epoch 12 committing 
with ccbId:10054/4294967380
Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO SU failover probation

[tickets] [opensaf:tickets] Re: #1452 LOG: Use root name when searching for stream objects

2015-08-17 Thread Anders Bjornerstedt

Possibly yes.

You could look at it this way.
Every application is free to perform unnecessarily inefficient searches.
A global search actually causes the local IMMND to copy the entire database.
In this case the LOg subtree probably contains less than 10 objects.

In general yes optimizations are enhancements and not defects.

Now we have the situation that we in OpenSAF 4.5 introduced support for 
long DNs and this
is enabled by a config attribute in the immsv.

So suddenly what used to be simply a gross inefficiency in LOGSv has now 
also become a problem for
users that want to deploy with long DNs but still want the LOG service 
configured.

There exists no other suitale solutiion as I see it.
There was talk of filtering but an implicit filter does not work since 
it implicitly changes the semantics of a search.
An explicit filter would be possible since then the application at least 
is saying that I am prepared to receive
a partial result for the query. This works if the appliaction somehow 
knows that all objects with logn DNs are not
relevant for it. In general that is a dangerous assumption.

But adding an explicit filter to the immsv is also an enhancement and 
quite frankly more work than the fix
for the LOGsv to just scope its search appropriately to its own root object.

/AndersBj



On 08/17/2015 11:17 AM, Mathi Naickan wrote:

 I guess we are not going round and round but probably just another 
 iteration on this topic! ;-)

 I think this is not about any particular IMM user, Note that there are 
 more services that are performing this kind of 'searching from root' 
 thing.
 The next question is what happens to the applications that have been 
 performing such search?

   * Mathi.

 

 *[tickets:#1452] http://sourceforge.net/p/opensaf/tickets/1452/ LOG: 
 Use root name when searching for stream objects*

 *Status:* review
 *Milestone:* 4.7-Tentative
 *Created:* Fri Aug 14, 2015 12:34 PM UTC by elunlen
 *Last Updated:* Mon Aug 17, 2015 07:47 AM UTC
 *Owner:* elunlen

 At startup the log server searches for stream configuration objects. 
 The search is done with no root object defined (NULL pointer for 
 rootName in parameter). Search root should be safApp=safLogService.

 

 Sent from sourceforge.net because you indicated interest in 
 https://sourceforge.net/p/opensaf/tickets/1452/

 To unsubscribe from further messages, please visit 
 https://sourceforge.net/auth/subscriptions/





---

** [tickets:#1452] LOG: Use root name when searching for stream objects**

**Status:** review
**Milestone:** 4.7-Tentative
**Created:** Fri Aug 14, 2015 12:34 PM UTC by elunlen
**Last Updated:** Mon Aug 17, 2015 09:17 AM UTC
**Owner:** elunlen


At startup the log server searches for stream configuration objects. The search 
is done with no root object defined (NULL pointer for rootName in parameter). 
Search root should be safApp=safLogService.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to http://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
http://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] Re: #1452 LOG: Use root name when searching for stream objects

2015-08-17 Thread Anders Bjornerstedt

If we are to apply it to more than the latest branch then we must declare it as 
a defect.
We don't want to keep the ticket state signaling clean.

In general you don't want enhancements on old releases because every change of 
behavior has risk associated with it.
By only doing enhancements on the latest branch we keep the number of 
surprises down on the older branches.

So instead of starting to have exception on enhancements handling, we need to 
declare it as  a defect to be applied on the latest 3 branches.

Long DNs was introduced in 4.5-

/AndersBj

From: elunlen [mailto:elun...@users.sf.net]
Sent: den 17 augusti 2015 12:52
To: [opensaf:tickets]
Subject: [opensaf:tickets] Re: #1452 LOG: Use root name when searching for 
stream objects


Maybe this an enhancement but what is the problem to apply this change to all 
active branches anyway it's the most practical thing to do regardless of any 
changes of handling long DNs that may be done in the future. This will not 
change any behavior of the log service except that it maybe will start a little 
bit faster and use less resources.

/Lennart

From: Anders Bjornerstedt [mailto:ander...@users.sf.net]
Sent: den 17 augusti 2015 12:10
To: [opensaf:tickets]
Subject: [opensaf:tickets] Re: #1452 LOG: Use root name when searching for 
stream objects

Possibly yes.

You could look at it this way.
Every application is free to perform unnecessarily inefficient searches.
A global search actually causes the local IMMND to copy the entire database.
In this case the LOg subtree probably contains less than 10 objects.

In general yes optimizations are enhancements and not defects.

Now we have the situation that we in OpenSAF 4.5 introduced support for
long DNs and this
is enabled by a config attribute in the immsv.

So suddenly what used to be simply a gross inefficiency in LOGSv has now
also become a problem for
users that want to deploy with long DNs but still want the LOG service
configured.

There exists no other suitale solutiion as I see it.
There was talk of filtering but an implicit filter does not work since
it implicitly changes the semantics of a search.
An explicit filter would be possible since then the application at least
is saying that I am prepared to receive
a partial result for the query. This works if the appliaction somehow
knows that all objects with logn DNs are not
relevant for it. In general that is a dangerous assumption.

But adding an explicit filter to the immsv is also an enhancement and
quite frankly more work than the fix
for the LOGsv to just scope its search appropriately to its own root object.

/AndersBj

On 08/17/2015 11:17 AM, Mathi Naickan wrote:

I guess we are not going round and round but probably just another
iteration on this topic! ;-)

I think this is not about any particular IMM user, Note that there are
more services that are performing this kind of 'searching from root'
thing.
The next question is what happens to the applications that have been
performing such search?

  *   Mathi.



[tickets:#1452]http://sourceforge.net/p/opensaf/tickets/1452/http://sourceforge.net/p/opensaf/tickets/1452/
 http://sourceforge.net/p/opensaf/tickets/1452/ LOG:
Use root name when searching for stream objects

Status: review
Milestone: 4.7-Tentative
Created: Fri Aug 14, 2015 12:34 PM UTC by elunlen
Last Updated: Mon Aug 17, 2015 07:47 AM UTC
Owner: elunlen

At startup the log server searches for stream configuration objects.
The search is done with no root object defined (NULL pointer for
rootName in parameter). Search root should be safApp=safLogService.



Sent from sourceforge.net because you indicated interest in
https://sourceforge.net/p/opensaf/tickets/1452/

To unsubscribe from further messages, please visit
https://sourceforge.net/auth/subscriptions/



[tickets:#1452]http://sourceforge.net/p/opensaf/tickets/1452/http://sourceforge.net/p/opensaf/tickets/1452/
 LOG: Use root name when searching for stream objects

Status: review
Milestone: 4.7-Tentative
Created: Fri Aug 14, 2015 12:34 PM UTC by elunlen
Last Updated: Mon Aug 17, 2015 09:17 AM UTC
Owner: elunlen

At startup the log server searches for stream configuration objects. The search 
is done with no root object defined (NULL pointer for rootName in parameter). 
Search root should be safApp=safLogService.



Sent from sourceforge.net because you indicated interest in 
https://sourceforge.net/p/opensaf/tickets/1452/

To unsubscribe from further messages, please visit 
https://sourceforge.net/auth/subscriptions/



[tickets:#1452]http://sourceforge.net/p/opensaf/tickets/1452/ LOG: Use root 
name when searching for stream objects

Status: review
Milestone: 4.7-Tentative
Created: Fri Aug 14, 2015 12:34 PM UTC by elunlen
Last Updated: Mon Aug 17, 2015 09:17 AM UTC
Owner: elunlen

At startup the log server searches

[tickets] [opensaf:tickets] Re: #1452 LOG: Use root name when searching for stream objects

2015-08-17 Thread Anders Bjornerstedt

Yes in my opinion.
But maybe we need a vote :)

/AndersBj

From: elunlen [mailto:elun...@users.sf.net]
Sent: den 17 augusti 2015 15:04
To: [opensaf:tickets]
Subject: [opensaf:tickets] #1452 LOG: Use root name when searching for stream 
objects

Is it Ok then to push this fix to all active branches and keep this ticket a 
defect ticket?

[tickets:#1452]http://sourceforge.net/p/opensaf/tickets/1452/ LOG: Use root 
name when searching for stream objects

Status: review
Milestone: 4.7-Tentative
Created: Fri Aug 14, 2015 12:34 PM UTC by elunlen
Last Updated: Mon Aug 17, 2015 12:10 PM UTC
Owner: elunlen

At startup the log server searches for stream configuration objects. The search 
is done with no root object defined (NULL pointer for rootName in parameter). 
Search root should be safApp=safLogService.

Sent from sourceforge.net because you indicated interest in 
https://sourceforge.net/p/opensaf/tickets/1452/

To unsubscribe from further messages, please visit 
https://sourceforge.net/auth/subscriptions/

---

** [tickets:#1452] LOG: Use root name when searching for stream objects**

**Status:** review
**Milestone:** 4.7-Tentative
**Created:** Fri Aug 14, 2015 12:34 PM UTC by elunlen
**Last Updated:** Mon Aug 17, 2015 01:03 PM UTC
**Owner:** elunlen

At startup the log server searches for stream configuration objects. The search 
is done with no root object defined (NULL pointer for rootName in parameter). 
Search root should be safApp=safLogService.

---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to http://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
http://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1291 IMM: IMMD healthcheck callback timeout when standby controller rebooted in middle of IMMND sync

2015-08-14 Thread Anders Bjornerstedt

I suggest we close this defect ticket as not reproducible.
That is unless someone can reproduce it.
My best guess is that this is yet another side effect of testing an overloaded 
system.

Since we have no load regulation mechanism and no overload protection mechanism,
it is relatively easy to push the system until it starts to break down. This is 
what is hapening here.
The IMMND misses a timeloy response on a helathchek heartbeat.
Such a heartbeat timeout I expect (hope) is set to 2 or 3 minutes.

The *only* reason for the healthcheck existence is to detect and restart a 
hung/looping process.
If a process is hunbg or looping it will be so indefinitely. So we dont want 
false positives shooting
down the service just because the system is temporarily overloaded. This just 
adds more load.

If a process has not repsonded in a few minutes then we really assume it is 
hung.
The assumption here is also that a really hung process is a very rare kind of 
problem.
This assumption is correct relative tho the IMMND because the IMMND only blocks
on calls to MDS and (ironically) on some syncronous AMF calls. 


---

** [tickets:#1291] IMM: IMMD healthcheck callback timeout when standby 
controller rebooted in middle of IMMND sync**

**Status:** assigned
**Milestone:** 4.5.2
**Created:** Mon Mar 30, 2015 07:21 AM UTC by Sirisha Alla
**Last Updated:** Fri Apr 10, 2015 01:09 PM UTC
**Owner:** Neelakanta Reddy
**Attachments:**

- 
[immlogs.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1291/attachment/immlogs.tar.bz2)
 (6.8 MB; application/x-bzip)


The issue is observed with 4.6 FC changeset 6377. The system is up and running 
with single pbe and 50k objects. This issue is seen after 
http://sourceforge.net/p/opensaf/tickets/1290 is observed. IMM application is 
running on standby controller and immcfg command is run from payload to set 
CompRestartMax value to 1000. IMMND is killed twice on standby controller 
leading to #1290.

As a result, standby controller left the cluster in middle of sync, IMMD 
reported healthcheck callback timeout and the active controller too went for 
reboot. Following is the syslog of SC-1:

Mar 26 14:58:17 SLES-64BIT-SLOT1 osafimmloadd: NO Sync starting
Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: NO Node Down event for node id 
2020f:
Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: NO Current role: ACTIVE
Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: Rebooting OpenSAF NodeId = 
131599 EE Name = , Reason: Received Node Down for peer controller, OwnNodeId = 
131343, SupervisionTime = 60
Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.412080] TIPC: Resetting link 
1.1.1:eth0-1.1.2:eth0, peer not responding
Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.412089] TIPC: Lost link 
1.1.1:eth0-1.1.2:eth0 on network plane A
Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.413191] TIPC: Lost contact with 
1.1.2
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not 
sending track callback for agents on that node
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not 
sending track callback for agents on that node
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not 
sending track callback for agents on that node
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not 
sending track callback for agents on that node
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not 
sending track callback for agents on that node
Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not 
sending track callback for agents on that node
Mar 26 14:58:30 SLES-64BIT-SLOT1 osafamfd[9628]: NO Node 'SC-2' left the cluster
Mar 26 14:58:30 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting remote node in the 
absence of PLM is outside the scope of OpenSAF
Mar 26 14:58:54 SLES-64BIT-SLOT1 kernel: [15226.674333] TIPC: Established link 
1.1.1:eth0-1.1.2:eth0 on network plane A
Mar 26 15:00:02 SLES-64BIT-SLOT1 syslog-ng[3261]: Log statistics; 
dropped='pipe(/dev/xconsole)=0', dropped='pipe(/dev/tty10)=0', 
processed='center(queued)=2197', processed='center(received)=1172', 
processed='destination(messages)=1172', processed='destination(mailinfo)=0', 
processed='destination(mailwarn)=0', 
processed='destination(localmessages)=955', processed='destination(newserr)=0', 
processed='destination(mailerr)=0', processed='destination(netmgm)=0', 
processed='destination(warn)=44', processed='destination(console)=13', 
processed='destination(null)=0', processed='destination(mail)=0', 
processed='destination(xconsole)=13', processed='destination(firewall)=0', 
processed='destination(acpid)=0', processed='destination(newscrit)=0', 
processed='destination(newsnotice)=0', processed='source(src)=1172'
Mar 26 15:00:07 SLES-64BIT-SLOT1 osafimmloadd: ER Too many TRY_AGAIN on 
saImmOmSearchNext - aborting
Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: ER SYNC APPARENTLY FAILED 
status:1
Mar 26 15:00:08

[tickets] [opensaf:tickets] #1445 imm: Don't check for pending fevs when updating pure runtime attributes

2015-08-14 Thread Anders Bjornerstedt

- **Type**: defect -- enhancement
- **Milestone**: 4.5.2 -- 4.7-Tentative
- **Comment**:

This is an enhancement not a defect.
There is no vilation of interface or behavior on the imm part related to this 
ticket.

Any IMM call can result in TRY_AGAIN and particularly calls going remote.
This call is no exception.

The fact that this call is currently over sensitive to fevs overload is not a 
defect.
Fevs overload occurs only because OpensaF has no overload protection or load 
regulation 
mechanism. So the fact that it occurs is itself a problem or a stress test.






---

** [tickets:#1445] imm: Don't check for pending fevs when updating pure runtime 
attributes**

**Status:** review
**Milestone:** 4.7-Tentative
**Created:** Thu Aug 13, 2015 09:48 AM UTC by Hung Nguyen
**Last Updated:** Fri Aug 14, 2015 05:49 AM UTC
**Owner:** Neelakanta Reddy


When invoking saImmOiRtObjectUpdate(), number of pending fevs messages is 
always checked on server side and TRY_AGAIN is returned when it reaches 
IMMSV_DEFAULT_FEVS_MAX_PENDING.

If the attributes to be updated are pure runtime attributes, number of pending 
fevs messages should not be checked because the IMMD_EVT_ND2D_OI_OBJ_MODIFY 
message wouldn't be sent out to broadcast to other IMMNDs.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1448 smf: Make campaigns less fragile by retrying on ERR_NO_RESOURCES

2015-08-14 Thread Anders Bjornerstedt




---

** [tickets:#1448] smf: Make campaigns less fragile by retrying on 
ERR_NO_RESOURCES**

**Status:** unassigned
**Milestone:** future
**Created:** Fri Aug 14, 2015 07:09 AM UTC by Anders Bjornerstedt
**Last Updated:** Fri Aug 14, 2015 07:09 AM UTC
**Owner:** nobody


The SMF service is a heavy user of the IMM service.
The IMM  has an established client pattern for ERR_TRY_AGAIN which allows an 
application realtime
control over how long it is prepared to wait for a transient inability of the 
IMM service to fullfill a request.
Each response of TRY_AGAIN should in itself be fast so the application needs a 
delay in its retry loop.

There is also the very similar error code ERR_NO_RESOURSES.  Logically that 
error code is identical
to TRY_AGAIN in that the request could not be accepted due to no fault of the 
client but due to some
more or less temporary problem in the IMM service. The difference is that 
NO_RESOURCES has no
realtime ambitions. Typically this error code is used by the imm when the imm 
can not fullfill a request
due to reasons that are outside of the imm service control. Also the time from 
request to a response
of ERR_NO_RESOUIRCES may be long. 

The SMF service in general has no realtime requirments. The main goal for the 
SMF service is to
successfully complete correctly formulated camopaings. This means that the SMF 
service should be
programmed to avoid unnecessary fragility related to temporary problems, even 
if the temporary problem
could linger for seconds or minutes. 

The alternative of aborting the campaign will itself discard potentially large 
execution times already
completed. It may sometimes even result in a system restore.

This means that SMF campaigns should have a retry loop that handles not just 
TRY_AGAIN,
but also ERR_NO_RESOURCES where this return code is relevant (can be returned 
according to
the API spec).. The error copde ERR_BUSY also exists and is for all practical 
purposes identical
to ERR_NO_RESOURCES in semantics, both logical and timing.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #58 IMM: IMM internally should create CCB error strings

2015-08-14 Thread Anders Bjornerstedt

- **Milestone**: future -- 4.6.1



---

** [tickets:#58] IMM: IMM internally should create CCB error strings**

**Status:** fixed
**Milestone:** 4.6.1
**Created:** Wed May 08, 2013 08:35 AM UTC by Anders Bjornerstedt
**Last Updated:** Fri Aug 14, 2015 08:04 AM UTC
**Owner:** nobody


Migrated from:
http://devel.opensaf.org/ticket/2712

For example in this case:
Jul 3 12:39:55.177799 osafimmnd [17744:ImmModel.cc:5146] T7
ERR_NOT_EXIST: object 'safSmfBundle=XXX' does not have an
implementer and flag SA_IMM_CCB_REGISTERED_OI is set


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1449 IMM: CCB interface for probing abort reason (validation error or resource error)

2015-08-14 Thread Anders Bjornerstedt




---

** [tickets:#1449] IMM: CCB interface for probing abort reason (validation 
error or resource error)**

**Status:** unassigned
**Milestone:** 4.7-Tentative
**Created:** Fri Aug 14, 2015 08:33 AM UTC by Anders Bjornerstedt
**Last Updated:** Fri Aug 14, 2015 08:33 AM UTC
**Owner:** nobody


Suggested interface, closely related to saImmOmCcbGetErrorStrings():

extern SaAisErrorT  saImmOmCcbGetAbortReason(SaImmCcbHandleT ccbHandle,
   
SaBoolT* isValidationAbort);

Arguments :  ccbHandle (in)-The ccb handle.
isValidationAbort (out)  - SA_TRUE if validation 
abort otherwise resource abort.
Return Values :  
   SA_AIS_ERR_BAD_HANDLE   - bad ccb handle.
   SA_AIS_ERR_INVALID_PARAM - handle is associated with ccb 
that is NOT aborted.
   SA_AIS_ERR_VERSION  (not using A.2.xx)


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1449 IMM: CCB interface for probing abort reason (validation error or resource error)

2015-08-14 Thread Anders Bjornerstedt

- Description has changed:

Diff:



--- old
+++ new
@@ -6,6 +6,7 @@
 Arguments :  ccbHandle (in)-The ccb handle.
 isValidationAbort (out)  - SA_TRUE if validation 
abort otherwise resource abort.
 Return Values :  
+   SA_AIS_OK
SA_AIS_ERR_BAD_HANDLE   - bad ccb handle.
SA_AIS_ERR_INVALID_PARAM - handle is associated with ccb 
that is NOT aborted.
SA_AIS_ERR_VERSION  (not using A.2.xx)






---

** [tickets:#1449] IMM: CCB interface for probing abort reason (validation 
error or resource error)**

**Status:** unassigned
**Milestone:** 4.7-Tentative
**Created:** Fri Aug 14, 2015 08:33 AM UTC by Anders Bjornerstedt
**Last Updated:** Fri Aug 14, 2015 08:33 AM UTC
**Owner:** nobody


Suggested interface, closely related to saImmOmCcbGetErrorStrings():

extern SaAisErrorT  saImmOmCcbGetAbortReason(SaImmCcbHandleT ccbHandle,
   
SaBoolT* isValidationAbort);

Arguments :  ccbHandle (in)-The ccb handle.
isValidationAbort (out)  - SA_TRUE if validation 
abort otherwise resource abort.
Return Values :  
   SA_AIS_OK
   SA_AIS_ERR_BAD_HANDLE   - bad ccb handle.
   SA_AIS_ERR_INVALID_PARAM - handle is associated with ccb 
that is NOT aborted.
   SA_AIS_ERR_VERSION  (not using A.2.xx)


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #58 IMM: IMM internally should create CCB error strings

2015-08-14 Thread Anders Bjornerstedt

- **status**: unassigned -- fixed
- **Comment**:

Since some releases back the IMM does generate error strings internally.
The above specific example is covered.



---

** [tickets:#58] IMM: IMM internally should create CCB error strings**

**Status:** fixed
**Milestone:** future
**Created:** Wed May 08, 2013 08:35 AM UTC by Anders Bjornerstedt
**Last Updated:** Wed May 08, 2013 08:35 AM UTC
**Owner:** nobody


Migrated from:
http://devel.opensaf.org/ticket/2712

For example in this case:
Jul 3 12:39:55.177799 osafimmnd [17744:ImmModel.cc:5146] T7
ERR_NOT_EXIST: object 'safSmfBundle=XXX' does not have an
implementer and flag SA_IMM_CCB_REGISTERED_OI is set


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #744 IMM: Use error string to classify cause for aborted CCB.

2015-08-14 Thread Anders Bjornerstedt

Some work on this has already already been done.

It remains to.

1) Survey the imm server CCB abort handling to see if there are cases where 
error string
is missing or the abort cause prefix is missing.
2) Ensure that the prefix is uniform, i.e. amenable to string matching.

The ccb abort reason prefix is prepended in to the text so that the abort 
reason becomes
obvious to a human user (e.g. immcfg result).

The intention is not to have programmers directly do string matching on the 
abort prefix.
Even if that is technically possible we will provide a wrapper function for 
this to be used for programmable handling of abort reason. A separate ticket 
will be created for that.




---

** [tickets:#744] IMM: Use error string to classify cause for aborted CCB.**

**Status:** unassigned
**Milestone:** future
**Created:** Thu Jan 23, 2014 12:17 PM UTC by Anders Bjornerstedt
**Last Updated:** Tue Jun 30, 2015 12:30 PM UTC
**Owner:** nobody


This is a special case of #58 (http://sourceforge.net/p/opensaf/tickets/58).

Enhancement #58 is a bit large and open-ended. This ticket focuses on a
particular need for complementary information about one error return code.

If a CCB related operation returns SA_AIS_ERR_FAILED_OPERATION it means that
the CCB has been aborted and the CCB-handle can no longer be used to generate
new (chained) CCBs. 

The cause of the aborted CCB can be classified into two broad mutually exclusive
categories:

   1) Logical errors related to the CCB buildup/contents. This would primarily
  be validation errors where an OI rejects a ccb-operation or rejects
  an apply.

   2) Resource problems in the immsv. This could be the need for imm-sync that
  gets priority over current non-empty CCBs that are not in critical. Or
  it could be a hung PBE that gets restarted and finds the CCB did not
  complete the commit, resulting in an abort. Or other reasons in immsv
  or below.

Some applications that have the capability to record an attempted CCB at
the application level, may wish to attempt a replay of an aborted CCB,
but only if the CCB was aborted due to a cause in category (2).

One could refine this to distinguish within (2) between definitely transient
resource problems (imm sync) from likely stable resource limits (huge CCBs
that fail to commit over PBE). The latter are more likely to repeatedly fail.
But such refinement will not be done in this ticket.

The idea is to prefix the error string with an identifiable tag of some form.
The tag must be documented in the IMMSV README and the IMMSV_PR.
This would make it relatively simple for an application developer to write
code to match against the initial sub-string.





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #744 IMM: Use error string to classify cause for aborted CCB.

2015-08-14 Thread Anders Bjornerstedt

- **Milestone**: future -- 4.7-Tentative



---

** [tickets:#744] IMM: Use error string to classify cause for aborted CCB.**

**Status:** unassigned
**Milestone:** 4.7-Tentative
**Created:** Thu Jan 23, 2014 12:17 PM UTC by Anders Bjornerstedt
**Last Updated:** Fri Aug 14, 2015 08:13 AM UTC
**Owner:** nobody


This is a special case of #58 (http://sourceforge.net/p/opensaf/tickets/58).

Enhancement #58 is a bit large and open-ended. This ticket focuses on a
particular need for complementary information about one error return code.

If a CCB related operation returns SA_AIS_ERR_FAILED_OPERATION it means that
the CCB has been aborted and the CCB-handle can no longer be used to generate
new (chained) CCBs. 

The cause of the aborted CCB can be classified into two broad mutually exclusive
categories:

   1) Logical errors related to the CCB buildup/contents. This would primarily
  be validation errors where an OI rejects a ccb-operation or rejects
  an apply.

   2) Resource problems in the immsv. This could be the need for imm-sync that
  gets priority over current non-empty CCBs that are not in critical. Or
  it could be a hung PBE that gets restarted and finds the CCB did not
  complete the commit, resulting in an abort. Or other reasons in immsv
  or below.

Some applications that have the capability to record an attempted CCB at
the application level, may wish to attempt a replay of an aborted CCB,
but only if the CCB was aborted due to a cause in category (2).

One could refine this to distinguish within (2) between definitely transient
resource problems (imm sync) from likely stable resource limits (huge CCBs
that fail to commit over PBE). The latter are more likely to repeatedly fail.
But such refinement will not be done in this ticket.

The idea is to prefix the error string with an identifiable tag of some form.
The tag must be documented in the IMMSV README and the IMMSV_PR.
This would make it relatively simple for an application developer to write
code to match against the initial sub-string.





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1313 osaf: opensaf does not start when long dn object is present in imm.db and cluster is reset

2015-08-13 Thread Anders Bjornerstedt

- **Comment**:

I think this ticket should be closed as invalid.
The mechanism works as documented.

The only relevant defect related to this incident is #1430 and it has been 
fixed.
https://sourceforge.net/p/opensaf/tickets/1430/
That is an application that does a search with a scope that does *not* include 
any
long DN object should not get hit by any longDn error.



---

** [tickets:#1313] osaf: opensaf does not start when long dn object is present 
in imm.db and cluster is reset**

**Status:** unassigned
**Milestone:** 4.5.1
**Created:** Mon Apr 13, 2015 08:57 AM UTC by Sirisha Alla
**Last Updated:** Wed Apr 22, 2015 07:05 AM UTC
**Owner:** Mathi Naickan
**Attachments:**

- 
[slot1.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1313/attachment/slot1.tar.bz2)
 (269.6 kB; application/x-bzip)


This is observed on changeset 6377 (46FC Tag). The system is up with single pbe 
and 50k objects. Long dns was enabled. There is one long dn object in the 
cluster.

Syslog on SC-1:

Apr  9 15:49:14 SLES-64BIT-SLOT1 osafimmnd[10731]: WA Setting attr 
longDnsAllowed to 0 in opensafImm=opensafImm,safApp=safImmService not allowed 
when long RDN exists inside object: 
xattrName_testAdminOwnerClear_SubLevelScope_1011

Now the cluster is reset. Nodes in the cluster fail to come up with the 
following reason:

Apr 13 13:04:55 SLES-64BIT-SLOT1 osafimmnd[3439]: NO Persistent Back End OI 
attached, pid: 3465
Apr 13 13:04:55 SLES-64BIT-SLOT1 osafimmnd[3439]: NO Implementer connected: 1 
(OpenSafImmPBE) 10, 2010f
Apr 13 13:04:55 SLES-64BIT-SLOT1 osafimmnd[3439]: NO implementer for class 
'OpensafImm' is OpenSafImmPBE = class extent is safe.
Apr 13 13:04:55 SLES-64BIT-SLOT1 osafimmpbed: NO Update epoch 20 committing 
with ccbId:10003/4294967299
Apr 13 13:04:56 SLES-64BIT-SLOT1 osafimmnd[3439]: NO PBE-OI established on this 
SC. Dumping incrementally to file imm.db
Apr 13 13:05:34 SLES-64BIT-SLOT1 opensafd[3378]: ER Timed-out for response from 
LOGD
Apr 13 13:05:34 SLES-64BIT-SLOT1 opensafd[3378]: ER
Apr 13 13:05:34 SLES-64BIT-SLOT1 opensafd[3378]: ER Going for recovery
Apr 13 13:05:34 SLES-64BIT-SLOT1 opensafd[3378]: ER Trying To RESPAWN 
/usr/lib64/opensaf/clc-cli/osaf-logd attempt #1
Apr 13 13:05:34 SLES-64BIT-SLOT1 opensafd[3378]: ER Sending SIGKILL to LOGD, 
pid=3452
Apr 13 13:05:49 SLES-64BIT-SLOT1 osaflogd[3500]: Started
Apr 13 13:05:49 SLES-64BIT-SLOT1 osaflogd[3500]: WA read_logsv_configuration(). 
All attributes could not be read
Apr 13 13:05:49 SLES-64BIT-SLOT1 osaflogd[3500]: NO Log config system: high 0 
low 0, application: high 0 low 0
Apr 13 13:05:49 SLES-64BIT-SLOT1 osaflogd[3500]: NO log root directory is: 
/var/log/opensaf/saflog
Apr 13 13:05:49 SLES-64BIT-SLOT1 osaflogd[3500]: NO LOG data group is:
Apr 13 13:05:49 SLES-64BIT-SLOT1 osaflogd[3500]: NO LGS_MBCSV_VERSION = 4
Apr 13 13:05:49 SLES-64BIT-SLOT1 osaflogd[3500]: saImmOmSearchInitialize 
FAILED, rc = 13
Apr 13 13:06:29 SLES-64BIT-SLOT1 opensafd[3378]: ER Timed-out for response from 
LOGD
Apr 13 13:06:29 SLES-64BIT-SLOT1 opensafd[3378]: ER Could Not RESPAWN LOGD
Apr 13 13:06:29 SLES-64BIT-SLOT1 opensafd[3378]: ER
Apr 13 13:06:29 SLES-64BIT-SLOT1 opensafd[3378]: ER Trying To RESPAWN 
/usr/lib64/opensaf/clc-cli/osaf-logd attempt #2
Apr 13 13:06:29 SLES-64BIT-SLOT1 opensafd[3378]: ER Sending SIGKILL to LOGD, 
pid=3495
Apr 13 13:06:44 SLES-64BIT-SLOT1 osaflogd[3546]: Started
Apr 13 13:06:44 SLES-64BIT-SLOT1 osaflogd[3546]: WA read_logsv_configuration(). 
All attributes could not be read
Apr 13 13:06:44 SLES-64BIT-SLOT1 osaflogd[3546]: NO Log config system: high 0 
low 0, application: high 0 low 0
Apr 13 13:06:44 SLES-64BIT-SLOT1 osaflogd[3546]: NO log root directory is: 
/var/log/opensaf/saflog
Apr 13 13:06:44 SLES-64BIT-SLOT1 osaflogd[3546]: NO LOG data group is:
Apr 13 13:06:44 SLES-64BIT-SLOT1 osaflogd[3546]: NO LGS_MBCSV_VERSION = 4
Apr 13 13:06:44 SLES-64BIT-SLOT1 osaflogd[3546]: saImmOmSearchInitialize 
FAILED, rc = 13
Apr 13 13:07:24 SLES-64BIT-SLOT1 opensafd[3378]: ER Timed-out for response from 
LOGD
Apr 13 13:07:24 SLES-64BIT-SLOT1 opensafd[3378]: ER Could Not RESPAWN LOGD
Apr 13 13:07:24 SLES-64BIT-SLOT1 opensafd[3378]: ER
Apr 13 13:07:24 SLES-64BIT-SLOT1 opensafd[3378]: ER FAILED TO RESPAWN
Apr 13 13:07:24 SLES-64BIT-SLOT1 osaffmd[3419]: exiting for shutdown
Apr 13 13:07:24 SLES-64BIT-SLOT1 osafimmd[3429]: exiting for shutdown
Apr 13 13:07:24 SLES-64BIT-SLOT1 osafimmnd[3439]: NO No IMMD service = cluster 
restart, exiting
Apr 13 13:07:24 SLES-64BIT-SLOT1 osafimmpbed: WA PBE lost contact with parent 
IMMND - Exiting
Apr 13 13:07:24 SLES-64BIT-SLOT1 osafrded[3410]: exiting for shutdown
Apr 13 13:07:24 SLES-64BIT-SLOT1 kernel: [ 1630.782513] TIPC: Disabling bearer 
eth:eth0
Apr 13 13:07:24 SLES-64BIT-SLOT1 kernel: [ 1630.782518] TIPC: Lost

[tickets] [opensaf:tickets] Re: #246 cpsv: Section create fails with random return values when mulitple processes try to create sections in the same checkpoint 70 node setup.

2015-08-10 Thread Anders Bjornerstedt

It makes absolutely no sense to have a defect ticket on a future release.

/AndersBj

From: A V Mahesh (AVM) [mailto:avmah...@users.sf.net]
Sent: den 6 augusti 2015 06:22
To: [opensaf:tickets]
Subject: [opensaf:tickets] #246 cpsv: Section create fails with random return 
values when mulitple processes try to create sections in the same checkpoint 70 
node setup.

  *   Type: enhancement -- defect
  *   Milestone: future -- 4.7-Tentative
  *   Comment:

Need to reproduce on current staging , if issue exist need to be fixed in 4.7 
release

[tickets:#246]http://sourceforge.net/p/opensaf/tickets/246/ cpsv: Section 
create fails with random return values when mulitple processes try to create 
sections in the same checkpoint 70 node setup.

Status: assigned
Milestone: 4.7-Tentative
Created: Thu May 16, 2013 06:37 AM UTC by A V Mahesh (AVM)
Last Updated: Wed Jul 15, 2015 02:46 PM UTC
Owner: A V Mahesh (AVM)

from http://devel.opensaf.org/ticket/2386

Changeset: 3065
Setup: 70 node SLES11 VM setup

2 applications per node are running on a 70 node setup.

Collocated checkpoint is created. After active replica is set from one process, 
section create with section id as GENERATED_SECTION_ID is invoked from rest of 
the processes. But the section create fails with ERR_EXIST, ERR_TIMEOUT, 
ERR_TRY_AGAIN.

/var/log/messages for the two controllers will be shared.

Sent from sourceforge.net because you indicated interest in 
https://sourceforge.net/p/opensaf/tickets/246/

To unsubscribe from further messages, please visit 
https://sourceforge.net/auth/subscriptions/

---

** [tickets:#246] cpsv: Section create fails with random return values when 
mulitple processes try to create sections in the same checkpoint  70 node 
setup. **

**Status:** assigned
**Milestone:** 4.7-Tentative
**Created:** Thu May 16, 2013 06:37 AM UTC by A V Mahesh (AVM)
**Last Updated:** Thu Aug 06, 2015 04:21 AM UTC
**Owner:** A V Mahesh (AVM)

 from http://devel.opensaf.org/ticket/2386

 Changeset: 3065
Setup: 70 node SLES11 VM setup

2 applications per node are running on a 70 node setup. 

Collocated checkpoint is created. After active replica is set from one process, 
section create with section id as GENERATED_SECTION_ID is invoked from rest of 
the processes. But the section create fails with ERR_EXIST, ERR_TIMEOUT, 
ERR_TRY_AGAIN.

/var/log/messages for the two controllers will be shared.

---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to http://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
http://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] Re: #246 cpsv: Section create fails with random return values when mulitple processes try to create sections in the same checkpoint 70 node setup.

2015-08-10 Thread Anders Bjornerstedt

What I mean is:  This is a n old ticket reporting a problem observed on an old 
release (4.2 according to the ticket).
But you are declaring that the problem will only be fixed for a future release !
If that is the intention then this *is* by definition an enhancement and not a 
defect.

/AndersBj

From: Anders Bjornerstedt [mailto:ander...@users.sf.net]
Sent: den 10 augusti 2015 09:02
To: [opensaf:tickets]
Subject: [opensaf:tickets] Re: #246 cpsv: Section create fails with random 
return values when mulitple processes try to create sections in the same 
checkpoint 70 node setup.


It makes absolutely no sense to have a defect ticket on a future release.

/AndersBj

From: A V Mahesh (AVM) [mailto:avmah...@users.sf.net]
Sent: den 6 augusti 2015 06:22
To: [opensaf:tickets]
Subject: [opensaf:tickets] #246 cpsv: Section create fails with random return 
values when mulitple processes try to create sections in the same checkpoint 70 
node setup.

  *   Type: enhancement -- defect
  *   Milestone: future -- 4.7-Tentative
  *   Comment:

Need to reproduce on current staging , if issue exist need to be fixed in 4.7 
release



[tickets:#246]http://sourceforge.net/p/opensaf/tickets/246/http://sourceforge.net/p/opensaf/tickets/246/
 cpsv: Section create fails with random return values when mulitple processes 
try to create sections in the same checkpoint 70 node setup.

Status: assigned
Milestone: 4.7-Tentative
Created: Thu May 16, 2013 06:37 AM UTC by A V Mahesh (AVM)
Last Updated: Wed Jul 15, 2015 02:46 PM UTC
Owner: A V Mahesh (AVM)

from http://devel.opensaf.org/ticket/2386

Changeset: 3065
Setup: 70 node SLES11 VM setup

2 applications per node are running on a 70 node setup.

Collocated checkpoint is created. After active replica is set from one process, 
section create with section id as GENERATED_SECTION_ID is invoked from rest of 
the processes. But the section create fails with ERR_EXIST, ERR_TIMEOUT, 
ERR_TRY_AGAIN.

/var/log/messages for the two controllers will be shared.



Sent from sourceforge.net because you indicated interest in 
https://sourceforge.net/p/opensaf/tickets/246/

To unsubscribe from further messages, please visit 
https://sourceforge.net/auth/subscriptions/



[tickets:#246]http://sourceforge.net/p/opensaf/tickets/246/ cpsv: Section 
create fails with random return values when mulitple processes try to create 
sections in the same checkpoint 70 node setup.

Status: assigned
Milestone: 4.7-Tentative
Created: Thu May 16, 2013 06:37 AM UTC by A V Mahesh (AVM)
Last Updated: Thu Aug 06, 2015 04:21 AM UTC
Owner: A V Mahesh (AVM)

from http://devel.opensaf.org/ticket/2386

Changeset: 3065
Setup: 70 node SLES11 VM setup

2 applications per node are running on a 70 node setup.

Collocated checkpoint is created. After active replica is set from one process, 
section create with section id as GENERATED_SECTION_ID is invoked from rest of 
the processes. But the section create fails with ERR_EXIST, ERR_TIMEOUT, 
ERR_TRY_AGAIN.

/var/log/messages for the two controllers will be shared.



Sent from sourceforge.net because you indicated interest in 
https://sourceforge.net/p/opensaf/tickets/246/

To unsubscribe from further messages, please visit 
https://sourceforge.net/auth/subscriptions/



---

** [tickets:#246] cpsv: Section create fails with random return values when 
mulitple processes try to create sections in the same checkpoint  70 node 
setup. **

**Status:** assigned
**Milestone:** 4.7-Tentative
**Created:** Thu May 16, 2013 06:37 AM UTC by A V Mahesh (AVM)
**Last Updated:** Thu Aug 06, 2015 04:21 AM UTC
**Owner:** A V Mahesh (AVM)


 from http://devel.opensaf.org/ticket/2386

 Changeset: 3065
Setup: 70 node SLES11 VM setup


2 applications per node are running on a 70 node setup. 


Collocated checkpoint is created. After active replica is set from one process, 
section create with section id as GENERATED_SECTION_ID is invoked from rest of 
the processes. But the section create fails with ERR_EXIST, ERR_TIMEOUT, 
ERR_TRY_AGAIN.


/var/log/messages for the two controllers will be shared.





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to http://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
http://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1436 MDS (TCP transport) fragment gets dropped, not received on standby node

2015-08-10 Thread Anders Bjornerstedt

- **Milestone**: 4.7-Tentative -- 4.6.1



---

** [tickets:#1436] MDS (TCP transport) fragment gets dropped, not received on 
standby node**

**Status:** unassigned
**Milestone:** 4.6.1
**Created:** Thu Aug 06, 2015 06:47 AM UTC by Girish
**Last Updated:** Mon Aug 10, 2015 04:25 AM UTC
**Owner:** nobody
**Attachments:**

- 
[cpsv_test_app.c](https://sourceforge.net/p/opensaf/tickets/1436/attachment/cpsv_test_app.c)
 (8.5 kB; text/x-csrc)


Opensaf version: 4.6
Linux: Standard Fedora 22 release, no additional patches required
default wmem_max/rmem_max values 
default buffer sizes for MDS_SOCK_SND_RCV_BUF_SIZE and DTM_SOCK_SND_RCV_BUF_SIZE
Active-standby model
opensaf run as root user/group

Steps:
 1. start opensaf on node1 (active) and node2 (standby)
 2. start ckpt_demo (modified application attached) on active node, ./ckpt_demo 
1
 3. wait till all the data is checkpointed
 4. start ckpt_demo on standby node, ./ckpt_demo 0
 
 Notice Error messages in mds.log:
 
 MDTM: Some stale message recd, hence dropping adest=
 
 My investigation is that one of the fragment is lost, active node sends - 
where as standby by node does not receive.
 
 mds log on standby:
 
 May 29  4:30:03.089974 8461 ERR| 
mdtm_process_poll_recv_data_tcp
May 29  4:30:03.089995 8461 ERR|before mds_mdtm_process_recvdata fun-call 
1, recd_bytes=1454, buff_toal_len=1454
May 29  4:30:03.090014 8461 ERR|MDTM: Recd message with Fragment 
Seqnum=18, frag_num=3049, from src_Tipc_id=0x0002020f:25826, pkt_type=35817
May 29  4:30:03.090032 8461 ERR|MDTM: Reassembling in FULL UB
May 29  4:30:03.090174 8461 ERR|mdtm_process_recv_events_tcp: pollres=1
May 29  4:30:03.090198 8461 ERR|mdtm_process_recv_events_tcp: 
pfd[0].revents=1
May 29  4:30:03.090216 8461 ERR| 
mdtm_process_poll_recv_data_tcp
May 29  4:30:03.090238 8461 ERR|before mds_mdtm_process_recvdata fun-call 
1, recd_bytes=1454, buff_toal_len=1454
May 29  4:30:03.090257 8461 ERR|MDTM: Recd message with Fragment 
Seqnum=18, frag_num=3050, from src_Tipc_id=0x0002020f:25826, pkt_type=35818
May 29  4:30:03.090275 8461 ERR|MDTM: Reassembling in FULL UB
May 29  4:30:03.090735 8461 ERR|mdtm_process_recv_events_tcp: pollres=1
May 29  4:30:03.090762 8461 ERR|mdtm_process_recv_events_tcp: 
pfd[0].revents=1
May 29  4:30:03.090780 8461 ERR| 
mdtm_process_poll_recv_data_tcp
May 29  4:30:03.090801 8461 ERR|before mds_mdtm_process_recvdata fun-call 
1, recd_bytes=1454, buff_toal_len=1454
May 29  4:30:03.090820 8461 ERR|MDTM: Recd message with Fragment 
Seqnum=18, frag_num=3051, from src_Tipc_id=0x0002020f:25826, pkt_type=35819
May 29  4:30:03.090838 8461 ERR|MDTM: Reassembling in FULL UB
May 29  4:30:03.090978 8461 ERR|mdtm_process_recv_events_tcp: pollres=1
May 29  4:30:03.091028 8461 ERR|mdtm_process_recv_events_tcp: 
pfd[0].revents=1
May 29  4:30:03.091047 8461 ERR| 
mdtm_process_poll_recv_data_tcp
May 29  4:30:03.091068 8461 ERR|before mds_mdtm_process_recvdata fun-call 
1, recd_bytes=1454, buff_toal_len=1454
May 29  4:30:03.091087 8461 ERR|MDTM: Recd message with Fragment 
Seqnum=18, frag_num=3053, from src_Tipc_id=0x0002020f:25826, pkt_type=35821
May 29  4:30:03.091106 8461 ERR|MDTM: ERROR Frag recd is not next frag so 
dropping adest=0x0002020f64e2
May 29  4:30:03.091125 8461 ERR|mdtm_process_recv_events_tcp: pollres=1
May 29  4:30:03.091143 8461 ERR|mdtm_process_recv_events_tcp: 
pfd[0].revents=1
May 29  4:30:03.091160 8461 ERR| 
mdtm_process_poll_recv_data_tcp
May 29  4:30:03.091180 8461 ERR|before mds_mdtm_process_recvdata fun-call 
1, recd_bytes=1454, buff_toal_len=1454
May 29  4:30:03.091198 8461 ERR|MDTM: Recd message with Fragment 
Seqnum=18, frag_num=3054, from src_Tipc_id=0x0002020f:25826, pkt_type=35822
May 29  4:30:03.091216 8461 ERR|MDTM: Message is dropped as msg is out of 
seq TRANSPOR-ID=0x0002020f64e2 
May 29  4:30:03.091235 8461 ERR|mdtm_process_recv_events_tcp: pollres=1
May 29  4:30:03.091283 8461 ERR|mdtm_process_recv_events_tcp: 
pfd[0].revents=1
May 29  4:30:03.091302 8461 ERR| 
mdtm_process_poll_recv_data_tcp


mds log on active:

May 29  4:29:36.021518 25826 ERR|before mds_mdtm_process_recvdata 
fun-call 1, recd_bytes=1454, buff_toal_len=1454
May 29  4:29:36.021537 25826 ERR|MDTM: Recd message with Fragment 
Seqnum=5, frag_num=3049, from src_Tipc_id=0x0002020f:25995, pkt_type=35817
May 29  4:29:36.021554 25826 ERR|MDTM: Reassembling in flat UB
May 29  4:29:36.021702 25995 ERR|successfully sent message, send_len=1456
May 29  4:29:36.021729 25995 ERR|MDTM:2 Sending message with Service 
Seqno=4, Fragment Seqnum=5, frag_num=35818, TO Dest_Tipc_id=0x0002020f:25826
May 29  4:29:36.021778 25826 ERR|mdtm_process_recv_events_tcp: pollres=1
May 29  4:29:36.021800 25826 ERR|mdtm_process_recv_events_tcp: 
pfd[0].revents=1
May 29  4:29:36.021817 25826 ERR| 
mdtm_process_poll_recv_data_tcp
May 29

[tickets] [opensaf:tickets] #1417 pyosaf: Add flag to Ccb class to let the user control the SA_IMM_CCB_REGISTERED flag

2015-07-24 Thread Anders Bjornerstedt

- **status**: review -- fixed
- **Comment**:

changeset:   6681:87d18f870326
tag: tip
user:Johan Mårtensson johan.o.martens...@ericsson.com
date:Thu Jul 16 14:25:08 2015 +0200
summary: pyosaf: (updated) Add parameter to Ccb constructor to set exact 
CCB flags [#1417]




---

** [tickets:#1417] pyosaf: Add flag to Ccb class to let the user control the 
SA_IMM_CCB_REGISTERED flag**

**Status:** fixed
**Milestone:** 4.7-Tentative
**Created:** Tue Jul 14, 2015 07:00 AM UTC by Johan Mårtensson
**Last Updated:** Wed Jul 15, 2015 12:43 PM UTC
**Owner:** Johan Mårtensson


The Ccb class in pyosaf.utils.immom.ccb gives a very convenient way for user 
code to make changes to IMM via CCBs. It unconditionally sets the 
SA_IMM_CCB_REGISTERED_FLAG which means that any code that requires the flag to 
be unset must use the low-level interface instead. The Ccb class should be 
enhanced to allow turning SA_IMM_CCB_REGISTERED off.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1425 IMM: Add attribute def flag SA_IMM_ATTR_STRONG_DEFAULT

2015-07-24 Thread Anders Bjornerstedt




---

** [tickets:#1425] IMM: Add attribute def flag SA_IMM_ATTR_STRONG_DEFAULT**

**Status:** unassigned
**Milestone:** future
**Created:** Fri Jul 24, 2015 12:49 PM UTC by Anders Bjornerstedt
**Last Updated:** Fri Jul 24, 2015 12:49 PM UTC
**Owner:** nobody


The saImmOmClassCreate_2() API allows the user to provide a list of attribute 
definitions. An attribute definition may include a default value.

The default value will be assigned to this attribute in an instance being 
created
by the saImmOmCcbObjectCreate_2() or the saImmOiRtObjectCreate_2() APIs, if the
user does not provide a value for that attribute.

But a user/OI may later update such an object/attribute assigning the empty 
value
to the attribute. So the default value mechanism is only effective for object
creation and not later in the life cycle of the object. This makes the default
attribute value mechanism weaker than some users would like. 

This enhancement proposes a new attribute flag SA_IMM_ATTR_STRONG_DEFAULT.
This flag will only be allowed to be set on an attribute definition that 
includes
a default value.

The meaning of the flag is that if a user attempts an update of an 
object/attribute
that assigns the empty value to such an attribute, then the IMM will replace, 
i.e.
override, that value with the default value defined in the class.





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1424 Agent crash leaving zombie OI

2015-07-22 Thread Anders Bjornerstedt

- **status**: unassigned -- assigned
- **assigned_to**: Hung Nguyen



---

** [tickets:#1424] Agent crash leaving zombie OI**

**Status:** assigned
**Milestone:** 4.5.2
**Created:** Wed Jul 22, 2015 10:34 AM UTC by Hung Nguyen
**Last Updated:** Wed Jul 22, 2015 10:34 AM UTC
**Owner:** Hung Nguyen


IMCN set IMMA_SYNCR_TIMEOUT to 1 sec.

For some reasons osafntfimcnd got timeout when invoking 
saImmOiImplementerSet(). Then it exited.
Jun 25 16:24:49 SC-1 osafntfimcnd[14926]: ER ntfimcn_imm_init Becoming an 
applier failed SA_AIS_ERR_TIMEOUT (5)
Jun 25 16:24:49 SC-1 osafntfimcnd[14926]: ER ntfimcn_imm_init() Fail

IMMND got IMMA_DOWN event before receiving IMMND_EVT_D2ND_IMPLSET_RSP from IMMD.
IMMND tried to discard the implementer of the client but there's nothing to 
discard at that moment.

Later, IMMND received IMMND_EVT_D2ND_IMPLSET_RSP and the implementer was added 
to immModel.
Jun 25 16:24:50 SC-1 osafimmnd[14887]: NO Implementer (applier) connected: 6 
(@OpenSafImmReplicatorB) 15, 2010f
Jun 25 16:24:50 SC-1 osafimmnd[14887]: WA IMMND - Client went down so no 
response

So when IMMND use immnd_client_node_get() to get the client node of the 
implementer, it will return null and fail to assert.
In this case, that happened in immnd_evt_proc_object_modify().
Jun 25 16:25:04 SC-1 osafimmnd[14887]: immnd_evt.c:6242: 
immnd_evt_proc_object_modify: Assertion 'oi_cl_node != NULL' failed.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #171 saAmfComponentUnregister should be unexposed to handle obtained from B 4 1 version

2015-07-15 Thread Anders Bjornerstedt

- **Type**: defect -- enhancement



---

** [tickets:#171] saAmfComponentUnregister should be unexposed to handle 
obtained from B 4 1 version**

**Status:** unassigned
**Milestone:** future
**Created:** Tue May 14, 2013 06:01 AM UTC by Nagendra Kumar
**Last Updated:** Tue May 14, 2013 06:01 AM UTC
**Owner:** nobody

Migrated from http://devel.opensaf.org/ticket/2019

saAmfComponentUnregister api should return SA_AIS_ERR_VERSION when called with 
handle obtained from B 4 1 version.





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #167 AMF: CSI descriptor for standby assignment contains wrong info

2015-07-15 Thread Anders Bjornerstedt

- **Type**: defect -- enhancement



---

** [tickets:#167] AMF: CSI descriptor for standby assignment contains wrong 
info**

**Status:** unassigned
**Milestone:** future
**Created:** Tue May 14, 2013 04:50 AM UTC by Nagendra Kumar
**Last Updated:** Tue May 14, 2013 04:50 AM UTC
**Owner:** nobody

Migrated from http://devel.opensaf.org/ticket/1790

Testing with the config as specified in 3.6.4. After having started a 3 node 
cluster the CSI set callbacks for standby assignments in the demo comp gets 
standbyRank always 0, activeCompName is not correct.


Dispatched healthCheck 2 in 
'safComp=AmfDemo?,safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?'


Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI1,safApp=AmfDemo?' 
HAState: Active CSIFlags: Add One
Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI4,safApp=AmfDemo?' 
HAState: Active CSIFlags: Add One
Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI5,safApp=AmfDemo?' 
HAState: Active CSIFlags: Add One
Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI2,safApp=AmfDemo?' 
HAState: Standby CSIFlags: Add One, standbyRank=0, 
activeCompName=safComp=AmfDemo?,safSu=SU3,safSg=AmfDemo?,safApp=AmfDemo?
Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI3,safApp=AmfDemo?' 
HAState: Standby CSIFlags: Add One, standbyRank=0, 
activeCompName=safComp=AmfDemo?,safSu=SU3,safSg=AmfDemo?,safApp=AmfDemo?
Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI6,safApp=AmfDemo?' 
HAState: Standby CSIFlags: Add One, standbyRank=0, 
activeCompName=safComp=AmfDemo?,safSu=SU2,safSg=AmfDemo?,safApp=AmfDemo?


Dispatched healthCheck 1 in 
'safComp=AmfDemo?,safSu=SU2,safSg=AmfDemo?,safApp=AmfDemo?'


Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI2,safApp=AmfDemo?' 
HAState: Active CSIFlags: Add One
Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI6,safApp=AmfDemo?' 
HAState: Active CSIFlags: Add One
Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI1,safApp=AmfDemo?' 
HAState: Standby CSIFlags: Add One, standbyRank=0, 
activeCompName=safComp=AmfDemo?,safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?
Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI3,safApp=AmfDemo?' 
HAState: Standby CSIFlags: Add One, standbyRank=0, 
activeCompName=safComp=AmfDemo?,safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?
Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI4,safApp=AmfDemo?' 
HAState: Standby CSIFlags: Add One, standbyRank=0, 
activeCompName=safComp=AmfDemo?,safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?
Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI5,safApp=AmfDemo?' 
HAState: Standby CSIFlags: Add One, standbyRank=0, 
activeCompName=safComp=AmfDemo?,safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?


Dispatched healthCheck 1 in 
'safComp=AmfDemo?,safSu=SU3,safSg=AmfDemo?,safApp=AmfDemo?'


Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI3,safApp=AmfDemo?' 
HAState: Active CSIFlags: Add One
Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI1,safApp=AmfDemo?' 
HAState: Standby CSIFlags: Add One, standbyRank=0, 
activeCompName=safComp=AmfDemo?,safSu=SU2,safSg=AmfDemo?,safApp=AmfDemo?
Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI2,safApp=AmfDemo?' 
HAState: Standby CSIFlags: Add One, standbyRank=0, 
activeCompName=safComp=AmfDemo?,safSu=SU2,safSg=AmfDemo?,safApp=AmfDemo?
Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI4,safApp=AmfDemo?' 
HAState: Standby CSIFlags: Add One, standbyRank=0, 
activeCompName=safComp=AmfDemo?,safSu=SU2,safSg=AmfDemo?,safApp=AmfDemo?
Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI5,safApp=AmfDemo?' 
HAState: Standby CSIFlags: Add One, standbyRank=0, 
activeCompName=safComp=AmfDemo?,safSu=SU2,safSg=AmfDemo?,safApp=AmfDemo?
Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI6,safApp=AmfDemo?' 
HAState: Standby CSIFlags: Add One, standbyRank=0, 
activeCompName=safComp=AmfDemo?,safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #168 saAmfComponentErrorReport() - errorDetectionTime not initialized by library

2015-07-15 Thread Anders Bjornerstedt

- **Type**: defect -- enhancement



---

** [tickets:#168] saAmfComponentErrorReport() - errorDetectionTime not 
initialized by library**

**Status:** unassigned
**Milestone:** future
**Created:** Tue May 14, 2013 05:37 AM UTC by Nagendra Kumar
**Last Updated:** Tue May 14, 2013 05:37 AM UTC
**Owner:** nobody

Migrated from http://devel.opensaf.org/ticket/1944

The AMF spec states that the library should initialize the absolute time when 
an error was reported. Today this is done by the amfnd and not the library. 
This potentially causes the error reporting time to get incorrect since it will 
depend on amfnd load.





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #178 escalation policy is not happening till the restart count exceeds, instead of reaching saAmfSGCompRestartMax for NPI components

2015-07-15 Thread Anders Bjornerstedt

- **Type**: defect -- enhancement



---

** [tickets:#178] escalation policy is not happening till the restart count 
exceeds, instead of reaching saAmfSGCompRestartMax for NPI components**

**Status:** unassigned
**Milestone:** future
**Created:** Tue May 14, 2013 06:24 AM UTC by Nagendra Kumar
**Last Updated:** Tue May 14, 2013 06:24 AM UTC
**Owner:** nobody

Migrated from http://devel.opensaf.org/ticket/2144

error escalation is not happening till the restart count exceeds 
saAmfSGCompRestartMax for the components brought up in NPI.


But according to spec, first level escalation should happen when the restart 
count reaches the saAmfSGCompRestartMax


Mentioned in the spec, 3.11.2.2 page NO: 203,


If this count reaches the saAmfSGCompRestartMax value before the end of the
component restart probation period, the Availability Management Framework per-
forms the first level of recovery escalation for that service unit: the 
Availability Man-
agement Framework restarts the entire service unit





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #183 component's operational state and SU's presence state are not updated in the multiple NPI components instantiation failure

2015-07-15 Thread Anders Bjornerstedt

- **Type**: defect -- enhancement



---

** [tickets:#183] component's operational state and SU's presence state are not 
updated in the multiple NPI components instantiation failure**

**Status:** unassigned
**Milestone:** future
**Created:** Tue May 14, 2013 07:00 AM UTC by Nagendra Kumar
**Last Updated:** Tue May 14, 2013 07:00 AM UTC
**Owner:** nobody

Migrated from http://devel.opensaf.org/ticket/2178

1.Unlocked the NPI SU having 4 components.
2. First two components got instantiated properly, third component got 
instantiation failed.
3. AMF tried to cleanup third component, which got failed and third component 
moved to instantiation-failure.
4. Now amf should termiante the instantiated compnents, where the presence 
state for SU and component should be set to TERMINATING, but presence state is 
not updated for SU in the current implementation.


5.AMF tried to terminate the first and second components, which got failed and 
cleanup also failed. Hence termination failure for first and second components. 
It's OK and according to section 3.2 in page 62


But according to spec section 4.8, as presence state is set to 
termination-failed.component's operational state should be set to DISABLED , 
which is not set in the current implementation.


Finally SU's presence state should be set to termination-failure.
which is not updated in the current implementation






---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #181 aAmfPmStart and Stop APIs with pmErrors = SA_AMF_PM_ABNORMAL_END gives ERR_INVALID_PARAM instead of OK.

2015-07-15 Thread Anders Bjornerstedt

- **Type**: defect -- enhancement



---

** [tickets:#181] aAmfPmStart and Stop APIs with pmErrors = 
SA_AMF_PM_ABNORMAL_END gives ERR_INVALID_PARAM instead of OK.**

**Status:** assigned
**Milestone:** future
**Created:** Tue May 14, 2013 06:54 AM UTC by Nagendra Kumar
**Last Updated:** Tue Aug 27, 2013 06:54 AM UTC
**Owner:** Nagendra Kumar

Migrated from http://devel.opensaf.org/ticket/2147

when passing the parameter as SA_AMF_PM_ABNORMAL_END gives ERR_INVALID_PARAM 
instead of OK.


The values as per spec  #define SA_AMF_PM_ABNORMAL_END 0x4. 





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #185 AMF: unnecessary/wrong updates of pure runtime attributes

2015-07-15 Thread Anders Bjornerstedt

- **Type**: defect -- enhancement



---

** [tickets:#185] AMF: unnecessary/wrong updates of pure runtime attributes**

**Status:** accepted
**Milestone:** future
**Created:** Tue May 14, 2013 07:20 AM UTC by Nagendra Kumar
**Last Updated:** Mon Jun 02, 2014 05:39 AM UTC
**Owner:** Gary Lee

Migrated from http://devel.opensaf.org/ticket/2227

saAmfCompRestartCount
saAmfCompCurrProxyName


function avd_data_update_req_evh() in avd_ndproc.c


pure runtime attributes should only be updated by the callback.





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #188 Use of pkill terminates the CLC-SCRIPT with a signal making amf think the component termination failed.

2015-07-15 Thread Anders Bjornerstedt

- **Type**: defect -- enhancement



---

** [tickets:#188] Use of pkill terminates the CLC-SCRIPT with a signal making 
amf think the component termination failed.**

**Status:** unassigned
**Milestone:** future
**Created:** Tue May 14, 2013 07:41 AM UTC by Nagendra Kumar
**Last Updated:** Tue May 14, 2013 07:41 AM UTC
**Owner:** nobody

Migrated from http://devel.opensaf.org/ticket/2330

Scenario:
—


In a restartable component make sure the component returns a error in any of 
the call backs. Amf will try terminating the component. In the termination 
script, if kill -9 pid of comp is used, the termination is being successful. 
Instead if pkill is used, the script is exiting with signal and amf is making 
the SU go in Termination-failed state.


Snippet from /var/log/messages:



Cleanup of 
'safComp=pxyXAppSiorder1,safSu=SU_pxyXAppSiorder1,safSg=SG_pxyXAppSiorder,safApp=pxyXAppSiorderApp'
 failed


Nov 21 19:38:17 SLES11-SLOT-2 osafamfnd[5944]: Reason:'Exec of script success, 
but script exits due to a signal'
Nov 21 19:38:17 SLES11-SLOT-2 osafamfnd[5944]: Signal: 15, CLC CLI 
script:'/home/surender/amf/term_proxy.sh'
Nov 21 19:38:17 SLES11-SLOT-2 osafamfnd[5944]: 
'safSu=SU_pxyXAppSiorder1,safSg=SG_pxyXAppSiorder,safApp=pxyXAppSiorderApp' 
Presence State TERMINATING = TERMINATION_FAILED


Note : The component has not registered any signal or handlers.


changeset :3047



Changed 18 months ago by hafe ¶
  Wouldn't that be the case if the script and the program binary has the same 
name?


in reply to: ↑ 1   Changed 18 months ago by surenderk ¶
  Replying to hafe:


Wouldn't that be the case if the script and the program binary has the same 
name?


The program binary and the script has different name.





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #174 admin unlock operation on SU in shutting down state should be succeded

2015-07-15 Thread Anders Bjornerstedt

- **Type**: defect -- enhancement



---

** [tickets:#174] admin unlock operation on SU in shutting down state should be 
succeded**

**Status:** unassigned
**Milestone:** future
**Created:** Tue May 14, 2013 06:16 AM UTC by Nagendra Kumar
**Last Updated:** Tue May 14, 2013 06:16 AM UTC
**Owner:** nobody

Migrated from http://devel.opensaf.org/ticket/2063

Perform shutdown operation on SU, which is already brought up in 2N model 
having active HA assignment.


Just respond to the csi set quiescing callback using saAmfResponse_4.


With out calling the saAmfQuiescingComplete api, perform the unlock operation 
on the SU which is in shutting down state.


Unlock operation should succeed according to spec ( section 9.4.2 , page NO : 
370 )


The invocation of this administrative operation transitions the administrative 
state of
the logical entity designated by the name to which objectName points to 
unlocked,
provided that the logical entity was previously in the locked or shutting-down 
adminis-
trative state.


Now when unlock operation is issued on the SU in shutting down state, it gives 
ERR_TRY_AGAIN return value.


Sep 16 17:45:23 SLES11-SLOT-1 osafamfd[24860]: Admin operation is already going


Also, following needs to be considered while the SU is in shutting down state,


1) saAmfSUReadinessState should be transitioned to shutting down and then to 
out of service, when quiescing operation is completed.
Now, saAmfSUReadinessState is set to out of service, whenever shutdown 
operation is performed. ( page no : 99 )


2) saAmfSISUHAState should be set to quiescing or quiesced accordingly while 
shutdown operation is under progress. Now,saAmfSISUHAState is set to active, 
till shutdown operation is completed. ( page no : 99 )


3) Whenver any other admin operation is performed on the SU in shutting down 
state, it should return BAD_OPERATION or corresponding. Currently, it gives 
TRY_AGAIN, which can be given in any context.





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #173 protection group callback is giving null info when component moves from quiesced to no redundancy

2015-07-15 Thread Anders Bjornerstedt

- **Type**: defect -- enhancement



---

** [tickets:#173] protection group callback is giving null info when component 
moves from quiesced to no redundancy**

**Status:** review
**Milestone:** 4.5.2
**Created:** Tue May 14, 2013 06:13 AM UTC by Nagendra Kumar
**Last Updated:** Wed Jul 08, 2015 11:48 AM UTC
**Owner:** Nagendra Kumar

Migrated from http://devel.opensaf.org/ticket/2034

Brought up 2 SU's in 2N model.


Protection group tracking is started with SA_TRACK_CHANGES.
When standby SU lock followed by active SU lock, three protection group 
callbacks are expected, 


1 ) For standby SU lock,


numberOfmembers - 1
notificatioBuffer-numberOfItems-2
standbyComponent's info filled with SA_AMF_PROTECTION_GROUP_REMOVED change and 
with haState as zero


2) for active SU lock, active to quiesced


numberOfmembers - 1
notificatioBuffer-numberOfItems-1
quiesced component's info filled with SA_AMF_PROTECTION_GROUP_STATE_CHANGE and 
with haState as QUIESCED


3) For quiesced to no redundancy change,


numberOfmembers - 0
notificatioBuffer-numberOfItems-1
old quiesced's info filled with SA_AMF_PROTECTION_GROUP_REMOVED change and with 
haState as zero


In the current implementation, the callback info in the third step is not 
filled with old quiesced's info. numberOfItems is given as zero.



Changed 20 months ago by erannjn ¶
  Can you please explain step 3, I don't really understand quiesced = no 
redundancy. Are you doing an amf-adm lock on SI?


Changed 20 months ago by erannjn ¶
  Not able to reproduce this. Please let me know what I am missing/not 
understand. Running demo app, 2N, SA_TRACK_CHANGES. Startup:


amf_demo[436]: amf_protection_group_callback():
amf_demo[436]: CSI='safCsi=AmfDemo,safSi=AmfDemo,safApp=AmfDemo1', members=1, 
items=1
amf_demo[436]: 
--
amf_demo[436]: item=0, change=2, 
comp=safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1, haState=1, rank=1
amf_demo[436]: 
--
amf_demo[436]: amf_protection_group_callback():
amf_demo[436]: CSI='safCsi=AmfDemo,safSi=AmfDemo,safApp=AmfDemo1', members=2, 
items=2
amf_demo[436]: 
--
amf_demo[436]: item=0, change=2, 
comp=safComp=AmfDemo,safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1, haState=2, rank=2
amf_demo[436]: item=1, change=1, 
comp=safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1, haState=1, rank=1
amf_demo[436]: 
--
amf_demo[436]: Dispatched healthCheck 3 in 
'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1'
amf_demo[436]: Dispatched healthCheck 4 in 
'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1'
lock standby SU2:


amf_demo[436]: Dispatched healthCheck 5 in 
'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1'
amf_demo[436]: Dispatched healthCheck 6 in 
'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1'
amf_demo[436]: amf_protection_group_callback():
amf_demo[436]: CSI='safCsi=AmfDemo,safSi=AmfDemo,safApp=AmfDemo1', members=1, 
items=2
amf_demo[436]: 
--
amf_demo[436]: item=0, change=1, 
comp=safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1, haState=1, rank=1
amf_demo[436]: item=1, change=3, 
comp=safComp=AmfDemo,safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1, haState=0, rank=2
amf_demo[436]: 
--
amf_demo[436]: Dispatched healthCheck 7 in 
'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1'
amf_demo[436]: Dispatched healthCheck 8 in 
'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1'

Lock active SU1:


amf_demo[436]: Dispatched healthCheck 9 in 
'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1'
amf_demo[436]: Dispatched healthCheck 10 in 
'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1'
amf_demo[436]:  Dispatched 'CSI Set' in 
'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' CSIName: '' HAState: 
Quiesced CSIFlags: Target All
amf_demo[436]: amf_protection_group_callback():
amf_demo[436]: CSI='safCsi=AmfDemo,safSi=AmfDemo,safApp=AmfDemo1', members=1, 
items=1
amf_demo[436]: 
--
amf_demo[436]: item=0, change=4, 
comp=safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1, haState=3, rank=1
amf_demo[436]: 
--
amf_demo[436]: Dispatched 'CSI Remove' in 
'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' CSI: '' CSIFlags: 
Target All
amf_demo[436]: amf_protection_group_callback():
amf_demo[436]:

[tickets] [opensaf:tickets] #177 N+M redandancy model was coming up with assignments even if component capability used is SA_AMF_COMP_ONE_ACTIVE_OR_ONE_STANDBY.

2015-07-15 Thread Anders Bjornerstedt

- **Type**: defect -- enhancement



---

** [tickets:#177] N+M redandancy model was coming up with assignments even if 
component capability used is SA_AMF_COMP_ONE_ACTIVE_OR_ONE_STANDBY.**

**Status:** unassigned
**Milestone:** future
**Created:** Tue May 14, 2013 06:21 AM UTC by Nagendra Kumar
**Last Updated:** Tue May 14, 2013 06:21 AM UTC
**Owner:** nobody

Migrated from http://devel.opensaf.org/ticket/2119

As per AMF B04.01 spec, section 3.6.3.1 page 132 Components implementing any of 
the capability models described in Section 3.5 on page 107, except the 1_active 
_or_1_standby capability model, can participate in the N+M redundancy model.


However I could able to bring up the N+M model with capability as 4 and also 
observed the SUSI and CSI HA assignments.


Configuring N+M model with capability as 4 should be detected during 
configuration itself and should be rejected.


# immlist 
safSupportedCsType=safVersion=4.0.0\,safCSType=safCsi_NpM1,safVersion=4.0.0,safCompType=Comp_NpMApp_npm_1_1


Name Type Value(s)

safSupportedCsType SA_NAME_T 
safSupportedCsType=safVersion=4.0.0\,safCSType=safCsi_NpM1 (58) 
saAmfCtDefNumMaxStandbyCSIs SA_UINT32_T 1 (0x1)
saAmfCtDefNumMaxActiveCSIs SA_UINT32_T 1 (0x1)
saAmfCtCompCapability SA_UINT32_T 4 (0x4)
SaImmAttrImplementerName? SA_STRING_T safAmfService 
SaImmAttrClassName? SA_STRING_T SaAmfCtCsType? 
SaImmAttrAdminOwnerName? SA_STRING_T Empty


linux-xc76:/opt/goahead/tetware/opensaffire/suites/avsv/regress/imm_auto # 
amf-state siass ha
safSISU=safSu=SC-1\,safSg=NoRed?\,safApp=OpenSAF,safSi=NoRed?2,safApp=OpenSAF


saAmfSISUHAState=ACTIVE(1)


safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF


saAmfSISUHAState=ACTIVE(1)


safSISU=safSu=SC-2\,safSg=NoRed?\,safApp=OpenSAF,safSi=NoRed?1,safApp=OpenSAF


saAmfSISUHAState=ACTIVE(1)


safSISU=safSu=SC-2\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF


saAmfSISUHAState=STANDBY(2)


safSISU=safSu=PL-3\,safSg=NoRed?\,safApp=OpenSAF,safSi=NoRed?3,safApp=OpenSAF


saAmfSISUHAState=ACTIVE(1)


safSISU=safSu=d_NplusM_1Norm_2\,safSg=SG_d_npm\,safApp=NpMApp,safSi=d_NplusM_1Norm_3,safApp=NpMApp


saAmfSISUHAState=ACTIVE(1)


safSISU=safSu=d_NplusM_1Norm_1\,safSg=SG_d_npm\,safApp=NpMApp,safSi=d_NplusM_1Norm_1,safApp=NpMApp


saAmfSISUHAState=ACTIVE(1)


safSISU=safSu=d_NplusM_1Norm_1\,safSg=SG_d_npm\,safApp=NpMApp,safSi=d_NplusM_1Norm_2,safApp=NpMApp


saAmfSISUHAState=ACTIVE(1)


safSISU=safSu=d_NplusM_1Norm_5\,safSg=SG_d_npm\,safApp=NpMApp,safSi=d_NplusM_1Norm_5,safApp=NpMApp


saAmfSISUHAState=STANDBY(2)


safSISU=safSu=d_NplusM_1Norm_2\,safSg=SG_d_npm\,safApp=NpMApp,safSi=d_NplusM_1Norm_4,safApp=NpMApp


saAmfSISUHAState=ACTIVE(1)


safSISU=safSu=d_NplusM_1Norm_3\,safSg=SG_d_npm\,safApp=NpMApp,safSi=d_NplusM_1Norm_5,safApp=NpMApp


saAmfSISUHAState=ACTIVE(1)


safSISU=safSu=d_NplusM_1Norm_3\,safSg=SG_d_npm\,safApp=NpMApp,safSi=d_NplusM_1Norm_6,safApp=NpMApp


saAmfSISUHAState=ACTIVE(1)


safSISU=safSu=d_NplusM_1Norm_4\,safSg=SG_d_npm\,safApp=NpMApp,safSi=d_NplusM_1Norm_1,safApp=NpMApp


saAmfSISUHAState=STANDBY(2)


safSISU=safSu=d_NplusM_1Norm_4\,safSg=SG_d_npm\,safApp=NpMApp,safSi=d_NplusM_1Norm_2,safApp=NpMApp


saAmfSISUHAState=STANDBY(2)


safSISU=safSu=d_NplusM_1Norm_4\,safSg=SG_d_npm\,safApp=NpMApp,safSi=d_NplusM_1Norm_3,safApp=NpMApp


saAmfSISUHAState=STANDBY(2)


safSISU=safSu=d_NplusM_1Norm_4\,safSg=SG_d_npm\,safApp=NpMApp,safSi=d_NplusM_1Norm_4,safApp=NpMApp


saAmfSISUHAState=STANDBY(2)


safSISU=safSu=d_NplusM_1Norm_5\,safSg=SG_d_npm\,safApp=NpMApp,safSi=d_NplusM_1Norm_6,safApp=NpMApp


saAmfSISUHAState=STANDBY(2)


linux-xc76:/opt/goahead/tetware/opensaffire/suites/avsv/regress/imm_auto # 





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #172 AVSv:Loops in the csi dependencies should be checked during config validatation and rejetcted

2015-07-15 Thread Anders Bjornerstedt

- **Type**: defect -- enhancement



---

** [tickets:#172] AVSv:Loops in the csi dependencies should be checked during 
config validatation and rejetcted**

**Status:** review
**Milestone:** 4.5.2
**Created:** Tue May 14, 2013 06:10 AM UTC by Nagendra Kumar
**Last Updated:** Fri Jul 10, 2015 04:46 AM UTC
**Owner:** Praveen

Migrated from http://devel.opensaf.org/ticket/2025

At present looping is detected at the time of addition of csi dependencies and 
assert is being triggered if detected. Loops should be checked during 
configuration validation and rejected





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #163 AMF: Auto-adjust for standby assignments in NWay red. model does not work

2015-07-15 Thread Anders Bjornerstedt

- **Type**: defect -- enhancement



---

** [tickets:#163] AMF: Auto-adjust for standby assignments in NWay red. model 
does not work**

**Status:** unassigned
**Milestone:** future
**Created:** Tue May 14, 2013 04:39 AM UTC by Nagendra Kumar
**Last Updated:** Tue May 14, 2013 04:39 AM UTC
**Owner:** nobody

Migrated from http://devel.opensaf.org/ticket/1763

Tested using the UML env and
samples/avsv/campaigns/NwayInstallationCampaign.xml


adjusted saAmfSGMaxActiveSIsperSU, saAmfSGMaxStandbySIsperSU, 
saAmfSIPrefStandbyAssignments to one.


safSi=Nway-0,safApp=AmfDemoNway?


saAmfSIAssignmentState=PARTIALLY_ASSIGNED(3)


safSi=Nway-1,safApp=AmfDemoNway?


saAmfSIAssignmentState=FULLY_ASSIGNED(2)


safSi=Nway-2,safApp=AmfDemoNway?


saAmfSIAssignmentState=FULLY_ASSIGNED(2)


safSISU=safSu=AmfDemoNway?-0\,safSg=SGNway\,safApp=AmfDemoNway?,safSi=Nway-1,safApp=AmfDemoNway?


saAmfSISUHAState=STANDBY(2)


safSISU=safSu=AmfDemoNway?-0\,safSg=SGNway\,safApp=AmfDemoNway?,safSi=Nway-2,safApp=AmfDemoNway?


saAmfSISUHAState=ACTIVE(1)


safSISU=safSu=AmfDemoNway?-1\,safSg=SGNway\,safApp=AmfDemoNway?,safSi=Nway-1,safApp=AmfDemoNway?


saAmfSISUHAState=ACTIVE(1)


safSISU=safSu=AmfDemoNway?-1\,safSg=SGNway\,safApp=AmfDemoNway?,safSi=Nway-2,safApp=AmfDemoNway?


saAmfSISUHAState=STANDBY(2)


safSISU=safSu=AmfDemoNway?-2\,safSg=SGNway\,safApp=AmfDemoNway?,safSi=Nway-0,safApp=AmfDemoNway?


saAmfSISUHAState=ACTIVE(1)


safSi=Nway-0,safApp=AmfDemoNway? has no standby assignment.



Related to http://devel.opensaf.org/ticket/1746 (see discussion)





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #164 AMF does not validate existence in model of SU for SaAmfSIRankedSU objects

2015-07-15 Thread Anders Bjornerstedt

- **Type**: defect -- enhancement



---

** [tickets:#164] AMF does not validate existence in model of SU for 
SaAmfSIRankedSU objects**

**Status:** review
**Milestone:** 4.5.2
**Created:** Tue May 14, 2013 04:41 AM UTC by Nagendra Kumar
**Last Updated:** Wed Jul 01, 2015 07:22 AM UTC
**Owner:** Nagendra Kumar

Migrated from http://devel.opensaf.org/ticket/1785

If the SU in SaAmfSIRankedSU does not exist, no error is reported by AMF and it 
actually start using the config.





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #166 AMF: creating an SaAmfSIRankedSU object with rank==0 causes inconsistence

2015-07-15 Thread Anders Bjornerstedt

- **Type**: defect -- enhancement



---

** [tickets:#166] AMF: creating an SaAmfSIRankedSU object with rank==0 causes 
inconsistence**

**Status:** unassigned
**Milestone:** future
**Created:** Tue May 14, 2013 04:48 AM UTC by Nagendra Kumar
**Last Updated:** Tue May 14, 2013 04:48 AM UTC
**Owner:** nobody

Migrated from http://devel.opensaf.org/ticket/1789


An SaAmfSIRankedSU object with rank 0 can be created but not changed or deleted.


The way AMF implements its SaAmfSIRankedSU DB is suspect and should be 
redesigned to store SaAmfSIRankedSU objects in a DN indexed represented data 
structure instead of the key SI-rank.





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1417 pyosaf: Add flag to Ccb class to let the user control the SA_IMM_CCB_REGISTERED flag

2015-07-15 Thread Anders Bjornerstedt

- **status**: assigned -- review



---

** [tickets:#1417] pyosaf: Add flag to Ccb class to let the user control the 
SA_IMM_CCB_REGISTERED flag**

**Status:** review
**Milestone:** 4.6.1
**Created:** Tue Jul 14, 2015 07:00 AM UTC by Johan Mårtensson
**Last Updated:** Tue Jul 14, 2015 07:00 AM UTC
**Owner:** Johan Mårtensson

The Ccb class in pyosaf.utils.immom.ccb gives a very convenient way for user 
code to make changes to IMM via CCBs. It unconditionally sets the 
SA_IMM_CCB_REGISTERED_FLAG which means that any code that requires the flag to 
be unset must use the low-level interface instead. The Ccb class should be 
enhanced to allow turning SA_IMM_CCB_REGISTERED off.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1418 pyosaf: Add convenience method to clear admin owner role on a set of objects

2015-07-15 Thread Anders Bjornerstedt

- **status**: assigned -- review



---

** [tickets:#1418] pyosaf: Add convenience method to clear admin owner role on 
a set of objects**

**Status:** review
**Milestone:** 4.7-Tentative
**Created:** Tue Jul 14, 2015 08:03 AM UTC by Johan Mårtensson
**Last Updated:** Tue Jul 14, 2015 08:38 AM UTC
**Owner:** Johan Mårtensson

The Ccb class is very convenient for performing IMM changes but it does not 
clear the admin role after apply. This should be added to avoid having the user 
code fall back to low-level C marshalling to clean up after the CCB.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1417 pyosaf: Add flag to Ccb class to let the user control the SA_IMM_CCB_REGISTERED flag

2015-07-15 Thread Anders Bjornerstedt

- **Milestone**: 4.6.1 -- 4.7-Tentative



---

** [tickets:#1417] pyosaf: Add flag to Ccb class to let the user control the 
SA_IMM_CCB_REGISTERED flag**

**Status:** review
**Milestone:** 4.7-Tentative
**Created:** Tue Jul 14, 2015 07:00 AM UTC by Johan Mårtensson
**Last Updated:** Wed Jul 15, 2015 12:01 PM UTC
**Owner:** Johan Mårtensson

The Ccb class in pyosaf.utils.immom.ccb gives a very convenient way for user 
code to make changes to IMM via CCBs. It unconditionally sets the 
SA_IMM_CCB_REGISTERED_FLAG which means that any code that requires the flag to 
be unset must use the low-level interface instead. The Ccb class should be 
enhanced to allow turning SA_IMM_CCB_REGISTERED off.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1414 amfd: N+M wrong assigment with SI dependency and role failure

2015-07-15 Thread Anders Bjornerstedt

- **Milestone**: future -- 4.5.2



---

** [tickets:#1414] amfd: N+M wrong assigment with SI dependency and role 
failure**

**Status:** review
**Milestone:** 4.5.2
**Created:** Mon Jul 13, 2015 05:32 PM UTC by Alex Jones
**Last Updated:** Wed Jul 15, 2015 09:04 AM UTC
**Owner:** Alex Jones

Given the following setup:

6 nodes:

1) 2N SG on nodes 1 and 2
2) N+1 SG on all nodes with SI dependencies for all its SIs with above 2N SI.
3) controllers on nodes 1 and 2 (also hosting payload SGs from 1 and 2)
4) Node 1 has active controller, active 2N assignment, and active N+1 assignment
5) Node 2 has standby controller, standby 2N assignment, and 5 N+M standby 
assignments

If I hard reset node 1, its active N+1 SI gets assigned to another SU that 
already has an active assignment, which is illegal.  And when node 1 comes back 
up, it gets no standby N+1 assignments.  It should get all the standby 
assignments for the N+1 SG.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1410 pyosaf: Invalid exception used in ImmObject (object.py)

2015-07-15 Thread Anders Bjornerstedt

- **Milestone**: 4.6.1 -- 4.5.2



---

** [tickets:#1410] pyosaf: Invalid exception used in ImmObject (object.py)**

**Status:** unassigned
**Milestone:** 4.5.2
**Created:** Fri Jul 10, 2015 10:11 AM UTC by Johan Mårtensson
**Last Updated:** Fri Jul 10, 2015 10:11 AM UTC
**Owner:** nobody

ImmObject uses an invalid way to raise exceptions:


 a = ImmObject('NonExistingClass')
Traceback (most recent call last):
  File stdin, line 1, in module
  File /usr/local/lib/python2.7/dist-packages/pyosaf/utils/immom/object.py, 
line 63, in __init__
raise
TypeError: exceptions must be old-style classes or derived from BaseException, 
not NoneType



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1410 pyosaf: Invalid exception used in ImmObject (object.py)

2015-07-15 Thread Anders Bjornerstedt

- **Version**: 4.5.2 -- 4.5



---

** [tickets:#1410] pyosaf: Invalid exception used in ImmObject (object.py)**

**Status:** unassigned
**Milestone:** 4.5.2
**Created:** Fri Jul 10, 2015 10:11 AM UTC by Johan Mårtensson
**Last Updated:** Wed Jul 15, 2015 12:46 PM UTC
**Owner:** nobody

ImmObject uses an invalid way to raise exceptions:


 a = ImmObject('NonExistingClass')
Traceback (most recent call last):
  File stdin, line 1, in module
  File /usr/local/lib/python2.7/dist-packages/pyosaf/utils/immom/object.py, 
line 63, in __init__
raise
TypeError: exceptions must be old-style classes or derived from BaseException, 
not NoneType



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1398 smf: Add capability to redo CCBs that fail

2015-07-15 Thread Anders Bjornerstedt

- **Milestone**: 4.7-Tentative -- future



---

** [tickets:#1398] smf: Add capability to redo CCBs that fail **

**Status:** unassigned
**Milestone:** future
**Created:** Wed Jul 01, 2015 02:07 PM UTC by Rafael
**Last Updated:** Mon Jul 13, 2015 09:51 AM UTC
**Owner:** nobody

CCBs may fail for a variety of resource related reasons. SMF campaigns can
be made more robust if they are capable of redoing/replaying a CCB that has 
been aborted. A CCB that is aborted due to validation error will not succeed
when replayed, but no damage will be done either. A CCB that is aborted due to
resource reasons may succeed when replayed, avoiding the abandonement of the
whole campaign.


During the final stages of an upgrade campaign PBE is enabled. PBE is not ready 
until it attaches, so CCB operations will get TRY_AGAIN in that window. Once the
PBE has attached the IMM is persistent-write-available and CCB operations are
allowed again.

Any CCB started and adding operations *before* the PBE was enabled by a CCB,
will be a doomed CCB. This since the CCBs generated operations before the PBE
was enabled and thus before the PBE was even starting and thus the PBE will be
unaware of these pre-PBE-enable operations. Such a CCB would fail on an op-count
check in the CCB commit processing of that CCB in the PBE. 

In 4.7-tentative an enhancement #1261 was implemented in the IMM service
to make this abort cleaner, i.e. to avoid the ugly op-count error in the PBE.
The PBE generates an admin-operation to abort *all* open CCBs (all CCBs that
are active but not critical), just before attaching. The problem was that the
first implementation of #1261 resulted in the PBE often attaching as OI *before*
the abort of non-critical CCBs had been processed. When the abort requested by 
the PBE was finally processed it aborted also innocent CCBs that had actually
started *after* the PBE was attached as PBE-OI.

The syndrome as such, i.e. attach of PBE causing the abort of a valid CCB,
could still happen on earlier releases but was quite rare. The syslog
would then show the op-count error reported by the PBE. 

A possible improvement in SMF is to read the runtime-attribute:

   opensafImmNostdFlags

in the OpenSAF IMM object opensafImm=opensafImm,safApp=safImmService

and check that it is not Empty which would mean that PBE is attached.
But it is not really clear why this is needed in 4.7-tentative when it was
not needed earlier. 

CCBs may actually get aborted due to resource error at any time and not only in
conjunction with PBE enable. A general increase of the robustness of SMF 
campaigns
could be achieved by adding logic for redoing CCBs that fail unexpectedly.
If such a CCB was valid, i.e. it was aborted due to resource error and not
validation error, then it has a high probability of succeeding when retried.


IMM ticked related to this: #1261


Jun 29 10:36:35 SC-2-2 osafimmpbed: IN Admop for aborting CCBs result: 1, immsv 
returned 1
Jun 29 10:36:35 SC-2-2 osafimmpbed: NO Update epoch 63 committing with 
ccbId:10185/4294967685
Jun 29 10:36:36 SC-2-2 osafsmfd[4726]: NO CAMP: Start campaign complete actions 
(95)
Jun 29 10:36:36 SC-2-2 osafimmnd[4476]: NO Create of PERSISTENT runtime object 
'smfRollbackElement=CampComplete,safSmfCampaign=ERIC-CMWUpgrade,safApp=safSmfService'
 (safSmfCampaign).
Jun 29 10:36:36 SC-2-2 osafimmnd[4476]: NO Ccb 305 COMMITTED 
(immcfg_SC-2-1_14718)
Jun 29 10:36:36 SC-2-2 osafimmnd[4476]: NO Ccb 306 COMMITTED 
(immcfg_SC-2-1_14741)
Jun 29 10:36:36 SC-2-2 osafimmnd[4476]: NO Ccb 307 COMMITTED 
(immcfg_SC-2-1_14764)
Jun 29 10:36:36 SC-2-2 osafimmnd[4476]: NO Ccb 308 COMMITTED 
(immcfg_SC-2-1_14787)
Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: NO Ccb 309 COMMITTED 
(immcfg_SC-2-1_14810)
Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: NO Ccb 310 COMMITTED 
(immcfg_SC-2-1_14833)
Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: NO Ccb 311 COMMITTED 
(immcfg_SC-2-1_14856)
Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: NO Ccb 312 COMMITTED 
(immcfg_SC-2-1_14879)
Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: NO Create of PERSISTENT runtime object 
'smfRollbackElement=ccb_0002,smfRollbackElement=CampComplete,safSmfCampaign=ERIC-CMWUpgrade,safApp=safSmfService'
 (safSmfCampaign).
Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: NO PBE-OI established on this SC. 
Dumping incrementally to file imm.db
Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: NO CCB 313 aborted by: immadm -o 202 
safRdn=immManagement,safApp=safImmService
Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: WA Timeout while waiting for 
implementer, aborting ccb:313
Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: NO Ccb 313 ABORTED (SMFSERVICE)
Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: WA s_info-to_svc == 0 reply 
context destroyed before this reply could be made
Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: WA Failed to send response to 
agent/client over MDS
Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: NO Ccb 313 not in correct state (12) 
for Apply ignoring request
Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: WA Spurious and

[tickets] [opensaf:tickets] #1417 pyosaf: Add flag to Ccb class to let the user control the SA_IMM_CCB_REGISTERED flag

2015-07-15 Thread Anders Bjornerstedt

- **Version**: 4.5.2 -- 



---

** [tickets:#1417] pyosaf: Add flag to Ccb class to let the user control the 
SA_IMM_CCB_REGISTERED flag**

**Status:** review
**Milestone:** 4.7-Tentative
**Created:** Tue Jul 14, 2015 07:00 AM UTC by Johan Mårtensson
**Last Updated:** Wed Jul 15, 2015 12:42 PM UTC
**Owner:** Johan Mårtensson

The Ccb class in pyosaf.utils.immom.ccb gives a very convenient way for user 
code to make changes to IMM via CCBs. It unconditionally sets the 
SA_IMM_CCB_REGISTERED_FLAG which means that any code that requires the flag to 
be unset must use the low-level interface instead. The Ccb class should be 
enhanced to allow turning SA_IMM_CCB_REGISTERED off.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #144 LOG: LOG server hangs with huge log records

2015-07-15 Thread Anders Bjornerstedt

- **Type**: defect -- enhancement



---

** [tickets:#144] LOG: LOG server hangs with huge log records**

**Status:** unassigned
**Milestone:** future
**Created:** Mon May 13, 2013 11:08 AM UTC by elunlen
**Last Updated:** Mon May 13, 2013 11:48 AM UTC
**Owner:** elunlen

Playing around testing LOG limits I noticed that with a record size of 64kB the 
log server hangs/spins at 100% CPU. 

Found the cause which was a never ending loop in log_stream_write() due to 
variable truncation. In the call to lgs_format_log_record() the 
fixedLogRecordSize parameter is uint16 but needs to be uint32 to match the 
configured value.

Migrated from devel.opensaf.org #2705


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #135 SI unassignment failed for SU on node lock

2015-07-15 Thread Anders Bjornerstedt

- **Type**: defect -- enhancement



---

** [tickets:#135] SI unassignment failed for SU on node lock**

**Status:** unassigned
**Milestone:** future
**Created:** Mon May 13, 2013 10:23 AM UTC by surender khetavath
**Last Updated:** Tue May 21, 2013 10:02 AM UTC
**Owner:** nobody

Changeset : 4241 with 27943117 patch
Model : TwoN
configuration: 1App,1SG,5SUs with 3comps each and 5SIs with 3CSIs each
SU1 has only 2comps i.e Asymmetric configuration
Transport : TCP/ipv6-linklocal
PBE enabled. 
si-si dependency configured as : Si1-Si2-SI3-Si4

scenario:
---
Initially SU2(active) is mapped to SC-2 and SU3(standby) mapped to PL-3.
Lock the node SC-2. 
A component in SU2 is made to reject quiescing assignment.
Escalation went till SuFailover, but assignments were not removed and SU2 held 
in terminating presence state. 

States after node lock
---

safAmfNode=SC-2,safAmfCluster=myAmfCluster
saAmfNodeAdminState=LOCKED(2)
saAmfNodeOperState=ENABLED(1)

safSi=TWONSI1,safApp=TWONAPP
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=FULLY_ASSIGNED(2)
safSi=TWONSI2,safApp=TWONAPP
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=FULLY_ASSIGNED(2)
safSi=TWONSI5,safApp=TWONAPP
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=PARTIALLY_ASSIGNED(3)
safSi=TWONSI3,safApp=TWONAPP
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=PARTIALLY_ASSIGNED(3)
safSi=TWONSI4,safApp=TWONAPP
saAmfSIAdminState=UNLOCKED(1)
saAmfSIAssignmentState=PARTIALLY_ASSIGNED(3)


safSu=SU2,safSg=SGONE,safApp=TWONAPP
saAmfSUAdminState=UNLOCKED(1)
saAmfSUOperState=DISABLED(2)
saAmfSUPresenceState=TERMINATING(4)
saAmfSUReadinessState=OUT-OF-SERVICE(1)




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #123 Sample SMF RPM integration

2015-07-15 Thread Anders Bjornerstedt

- **Type**: defect -- enhancement



---

** [tickets:#123] Sample SMF RPM integration**

**Status:** unassigned
**Milestone:** future
**Created:** Mon May 13, 2013 08:48 AM UTC by Ingvar Bergström
**Last Updated:** Mon May 13, 2013 08:48 AM UTC
**Owner:** nobody

http://devel.opensaf.org/ticket/1905

 In order to make SMF more usable, OpenSAF should contain a sample integration 
with RPM.

Some ideas:
- an SMF rpm repo, managed by some new scripts
- importing an rpm will create a Bundle object in IMM with install/remove 
scripts setup properly
- ETF.xml integrated in rpm metadata e.g. the description field of the header.
- ETF.xml is needed since install scripts are needed for restartable resp. 
non-restartable components.
- non AMF SW is out of scope
- sample campaigns
- sample use of deploying application specific IMM configuration

Even better would be integration with yum or zypper. But one step at a time...


Changed 2 years ago by hafe

status changed from new to accepted

Changed 16 months ago by hafe

milestone changed from 4.2.0.GA to 4.3.GA

Changed 2 months ago by hafe

milestone changed from 4.3.GA to future_releases

Changed 7 weeks ago by hafe

owner changed from hafe to ingber
status changed from accepted to assigned



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #6 Amfd crashed on active controller

2015-07-15 Thread Anders Bjornerstedt

- **Type**: defect -- enhancement



---

** [tickets:#6] Amfd crashed on active controller**

**Status:** unassigned
**Milestone:** future
**Created:** Mon May 06, 2013 07:21 AM UTC by Nagendra Kumar
**Last Updated:** Tue Mar 24, 2015 10:33 AM UTC
**Owner:** nobody

Migrated from http://devel.opensaf.org/ticket/3135
==
Changeset : 4200
Transport : TCP/ipv6 ( link local )
patches : 2794
PBE enabled.
Model : NWAY with Si Dep configured.
configuration : 1SG,5SUs,3comps in each su, 5Sis with 3csi each.
Si Dep : si1(Sponsor) - si2 - si3 - si4
SU1,SU2,SU3,SU4 are mapped to sc-1,sc-2,pl-3,pl-4 resp. su5 is also on pl-5. 
SC-1 was active and sc-2 standby. 


scenario:
A campaign was modelled to add one more node pl-5 and SUEXP on PL-5. 


/var/log/messages on SC-1:
Apr 25 12:38:08 OEL-64BIT-SLOT2 osafamfd[20569]: avd_su.c:1551: 
avd_su_dec_curr_stdby_si: Assertion 'su-saAmfSUNumCurrStandbySIs  0' failed.
Apr 25 12:38:08 OEL-64BIT-SLOT2 osafamfnd[20586]: ER AMF director unexpectedly 
crashed
Apr 25 12:38:08 OEL-64BIT-SLOT2 osafamfnd[20586]: Rebooting OpenSAF NodeId? = 
131343 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) 
received
Apr 25 12:38:08 OEL-64BIT-SLOT2 opensaf_reboot: Rebooting local node
Apr 25 12:38:08 OEL-64BIT-SLOT2 osafimmnd[20477]: NO Implementer locally 
disconnected. Marking it as doomed 4 22, 2010f (safAmfService)


GDB output
(gdb) bt
#0 0x003c0be328a5 in raise () from /lib64/libc.so.6
#1 0x003c0be34085 in abort () from /lib64/libc.so.6
#2 0x003321a18fbb in osafassert_fail (file=0x4afe07 avd_su.c, line=1551, 


func=0x4b0cc0 avd_su_dec_curr_stdby_si, assertion=0x4b0c30 
su-saAmfSUNumCurrStandbySIs  0)
at sysf_def.c:301


#3 0x0048e90d in avd_su_dec_curr_stdby_si (su=0x1391120) at 
avd_su.c:1551
#4 0x00490311 in avd_susi_update_assignment_counters (susi=0x13e0ae0, 
action=AVSV_SUSI_ACT_MOD, 


current_ha_state=SA_AMF_HA_STANDBY, new_ha_state=SA_AMF_HA_ACTIVE) at 
avd_siass.c:697


#5 0x0048fffc in avd_susi_mod_send (susi=0x13e0ae0, 
ha_state=SA_AMF_HA_ACTIVE) at avd_siass.c:616
#6 0x00477c83 in avd_sg_nway_susi_succ_sg_realign (cb=0x6c2d20, 
su=0x137fc30, susi=0x134c8e0, 


act=AVSV_SUSI_ACT_DEL, state=SA_AMF_HA_QUIESCED) at avd_sgNWayfsm.c:2590


#7 0x00470aef in avd_sg_nway_susi_sucss_func (cb=0x6c2d20, 
su=0x137fc30, susi=0x134c8e0, 


act=AVSV_SUSI_ACT_DEL, state=SA_AMF_HA_QUIESCED) at avd_sgNWayfsm.c:337


#8 0x0047dcd8 in avd_su_si_assign_evh (cb=0x6c2d20, evt=0x7ffcc8001fd0) 
at avd_sgproc.c:859
#9 0x0043def2 in avd_process_event (cb_now=0x6c2d20, 
evt=0x7ffcc8001fd0) at avd_proc.c:591
#10 0x0043dc56 in avd_main_proc () at avd_proc.c:507
#11 0x00409c23 in main (argc=2, argv=0x7fff5035b6a8) at amfd_main.c:47





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1047 matches

Mail list logo