[tickets] [opensaf:tickets] #2418 imm: Info of dead IMMND remains in standby IMMD
I the defect only occurs in a headless system, then I think the ticket slogan, or at least the description sholud say so. --- ** [tickets:#2418] imm: Info of dead IMMND remains in standby IMMD** **Status:** review **Milestone:** 5.0.2 **Created:** Mon Apr 10, 2017 10:23 AM UTC by Hung Nguyen **Last Updated:** Thu Apr 13, 2017 09:49 AM UTC **Owner:** Hung Nguyen **Attachments:** - [log.tgz](https://sourceforge.net/p/opensaf/tickets/2418/attachment/log.tgz) (149.4 kB; application/x-compressed) When Standby IMMD is up at the same time with a IMMND exiting, the info of that IMMND might not be removed from **immnd_tree** of the Standby IMMD. Details of the problem is explained in the sequence diagram below [sequence diagram](http://sequencediagram.org/index.html?initialData=A4QwTgLglgxloDsIAICCBhAKgWgJIFl8ARAKFElnhCWQGVMAhPQ0kkAIwHsAPZTgNwCmYOo2bFkAYjCCAJgC5kRAPIB1AHLJBQmgDMwnALbIC+dUT4JkCTrMHIAGiRJdeA4aKamii3AigoxLQAOgh+Acj4DOjI1LLIAM6CgdHIBgA29hCcdBBx7ACezvReLMjYAHxoWOIW8uic6bIJBQgwaYIAjgCuggkQziQYON7lVSW1ig1NLW0dCcCcCEmhEAAW9qbmyOlQ-chQbenddgnI65uE20vWtvZOzhw8fEIiw7VSMgpKapragnoDMYthYbjY7I5nK4Xh53t4ADQTbyKTAbExXCx7DqGdzxfRGaojMrsbooGSGECHM6HTy1IA) SC-5 was Active, SC-2 was Standby, IMMND on SC-1 was exiting ~~~ 18:35:03 SC-1 osafimmnd[441]: exiting for shutdown 18:35:03 SC-2 osafrded[413]: NO RDE role set to STANDBY 18:35:03 SC-2 osafimmd[430]: NO MDS event from svc_id 25 (change:3, dest:568511936070075) 18:35:03 SC-2 osafimmd[430]: NO MDS event from svc_id 25 (change:3, dest:567412424442298) 18:35:03 SC-2 osafimmd[430]: NO MDS event from svc_id 25 (change:3, dest:566312912814523) 18:35:03 SC-2 osafimmd[430]: NO MDS event from svc_id 25 (change:3, dest:565213401186744) 18:35:03 SC-5 osafimmd[433]: NO MDS event from svc_id 25 (change:4, dest:564113889558969) ~~~ Down event for IMMND@SC-1 was received on SC-5 but not on SC-2. **The symptoms:** 1. If the down IMMND is the corrdinator, that results in when that Standby IMMD becomes Active, it fails to elect new coordinator as there's already a coordinator in the **immnd_tree**. ~~~ 18:35:11 SC-2 osafimmd[430]: WA IMMND coordinator at 2050f apparently crashed => electing new coord ~~~ No more logs about newly elected coordinator were printed out. 2. When IMMND@SC-1 is up again, it will fail to introduce to IMMD because the IMMD already have IMMND@SC-1 in **immnd_tree** with a wrong epoch. ~~~ 18:35:29 SC-1 osafimmnd[441]: NO SERVER STATE: IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING 18:35:29 SC-1 osafimmnd[441]: NO This IMMND is now the NEW Coord 18:35:29 SC-1 osafimmnd[441]: ER 3 > 0, exiting ~~~ --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2398 imm: retry of ccb abort should be allowed if failed with TRY_AGAIN and TIMEOUT
TRY_AGAIN ok. But ERR_TIMEOUT ? Not sure what you are doing here. The two error codes are different in meaning. TRY_AGAIN means the client KNOWS that the call was NOT processed. TIMEOUT mens the client does NOT KNOW if the call was processed in the server or not. TIMEOUT may be (is typically) generated in the client library when the client has waited too long for a response from the server. --- ** [tickets:#2398] imm: retry of ccb abort should be allowed if failed with TRY_AGAIN and TIMEOUT** **Status:** review **Milestone:** 5.0.2 **Created:** Mon Mar 27, 2017 07:50 AM UTC by Neelakanta Reddy **Last Updated:** Mon Mar 27, 2017 08:02 AM UTC **Owner:** Neelakanta Reddy steps : 1. create a ccb 2. saImmOmCcbAbort the ccb, the return code should be TRY_AGAIN, which can be re-produced when fevs queue is full T2 Too many pending incoming FEVS messages (> 16) enqueueing async message. Backlog:1 The saImmOmCcbAbort ccb will create the imma_newCcbId, without finalizing old ccbid. solution: do not create new ccbid when the return code is TRY_AGAIN or TIMEOUT --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2393 Immd got crashed on Active as immnd restarted on Active with cluster having single controller and payload
Note also that the IMMD does not "crash", it exits. Mar 23 11:06:12 SO-SLOT-1 osafimmd[2138]: ER IMM RELOAD with NO persistent back end => ensure cluster restart by IMMD exit at both SCs, exiting --- ** [tickets:#2393] Immd got crashed on Active as immnd restarted on Active with cluster having single controller and payload** **Status:** unassigned **Milestone:** 5.2.RC2 **Created:** Thu Mar 23, 2017 05:58 AM UTC by Ritu Raj **Last Updated:** Thu Mar 23, 2017 05:37 PM UTC **Owner:** nobody **Attachments:** - [PL-3.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2393/attachment/PL-3.tar.bz2) (558.9 kB; application/x-bzip) - [SC-1.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2393/attachment/SC-1.tar.bz2) (2.5 MB; application/x-bzip) ###Environment details OS : Suse 64bit Changeset : 8701 ( 5.2.RC1) 2 nodes setup(1 controller and 1 payload) ###Summary Immd got crashed on Active as immnd restarted on Active with cluster having single controller and payload ###Steps followed & Observed behaviour 1. Bring up cluster wtih 1 controller and 1 payload 2. Kill immnd on active controller 3. Observed, that immd got crashed on Active controller(SC-1) due to which Payload also got rebooted ** Issue obserbed when there is only one controller ** **Syslog** SC-1::: Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: NO 'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' component restart probation timer started (timeout: 600 ns) Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: NO Restarting a component of 'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1) Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: NO 'safComp=IMMND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'componentRestart' Mar 23 11:06:12 SO-SLOT-1 osafsmfd[2235]: WA DispatchOiCallback: saImmOiDispatch() Fail 'SA_AIS_ERR_BAD_HANDLE (9)' Mar 23 11:06:12 SO-SLOT-1 osafntfimcnd[2181]: NO saImmOiDispatch() Fail SA_AIS_ERR_BAD_HANDLE (9) Mar 23 11:06:12 SO-SLOT-1 osafimmd[2138]: WA IMMND coordinator at 2010f apparently crashed => electing new coord Mar 23 11:06:12 SO-SLOT-1 osafimmd[2138]: ER Failed to find candidate for new IMMND coordinator (ScAbsenceAllowed:0 RulingEpoch:2 Mar 23 11:06:12 SO-SLOT-1 osafimmd[2138]: ER Active IMMD has to restart the IMMSv. All IMMNDs will restart Mar 23 11:06:12 SO-SLOT-1 osafimmd[2138]: ER IMM RELOAD with NO persistent back end => ensure cluster restart by IMMD exit at both SCs, exiting Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: NO 'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: ER safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131343, SupervisionTime = 60 Mar 23 11:06:12 SO-SLOT-1 opensaf_reboot: Rebooting local node; timeout=60 PL-3::: Mar 23 11:06:21 SO-SLOT-3 osafimmnd[2280]: ER IMMND forced to restart on order from IMMD, exiting Mar 23 11:06:21 SO-SLOT-3 osafamfnd[2290]: NO 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' component restart probation timer started (timeout: 600 ns) Mar 23 11:06:21 SO-SLOT-3 osafamfnd[2290]: NO Restarting a component of 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1) Mar 23 11:06:21 SO-SLOT-3 osafamfnd[2290]: NO 'safComp=IMMND,safSu=PL-3,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'componentRestart' Mar 23 11:06:21 SO-SLOT-3 osafimmnd[2755]: mkfifo already exists: /var/lib/opensaf/osafimmnd.fifo File exists Mar 23 11:06:21 SO-SLOT-3 osafimmnd[2755]: Started Mar 23 11:06:26 SO-SLOT-3 osafamfnd[2290]: WA AMF director unexpectedly crashed Mar 23 11:06:26 SO-SLOT-3 osafamfnd[2290]: Rebooting OpenSAF NodeId = 131855 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, OwnNodeId = 131855, SupervisionTime = 60 Traces: >From traces Active 'Failed to find candidate for new IMMND coordinator' and >Active IMMD has to restart the IMMSv ~~~ Mar 23 11:06:12.535325 osafimmd [2138:src/imm/immd/immd_evt.c:2638] T5 Received IMMND service event Mar 23 11:06:12.535349 osafimmd [2138:src/imm/immd/immd_evt.c:2741] T5 PROCESS MDS EVT: NCSMDS_DOWN, my PID:2138 Mar 23 11:06:12.535451 osafimmd [2138:src/imm/immd/immd_evt.c:2748] T5 NCSMDS_DOWN => local IMMND down Mar 23 11:06:12.535463 osafimmd [2138:src/imm/immd/immd_evt.c:2763] T5 IMMND DOWN PROCESS detected by IMMD Mar 23 11:06:12.535475 osafimmd [2138:src/imm/immd/immd_proc.c:0618] >> immd_process_immnd_down Mar 23 11:06:12.535483 osafimmd [2138:src/imm/immd/immd_proc.c:0621] T5 immd_process_immnd_down pid:2149 on-active:1 cb->immnd_coord:2010f Mar 23 11:06:12.535503 osafimmd [2138:src/imm/immd/immd_proc.c:0628] WA IMMND coordinator at 2010f apparently crashed => electing new coord Mar 23 11:06:12.535516 osafimmd
[tickets] [opensaf:tickets] #2393 Immd got crashed on Active as immnd restarted on Active with cluster having single controller and payload
- **Comment**: Unless this ticket describes system that has been configured to allow a headless/SC-absence, then the above is expected behavior and this ticket is invalid. I see no mention of headless/sc-absence mentioned. The cluster has to reload because the IMMND at a payload can not take on the role of coordinator IMMND in a normal configuration. --- ** [tickets:#2393] Immd got crashed on Active as immnd restarted on Active with cluster having single controller and payload** **Status:** unassigned **Milestone:** 5.2.RC2 **Created:** Thu Mar 23, 2017 05:58 AM UTC by Ritu Raj **Last Updated:** Thu Mar 23, 2017 05:58 AM UTC **Owner:** nobody **Attachments:** - [PL-3.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2393/attachment/PL-3.tar.bz2) (558.9 kB; application/x-bzip) - [SC-1.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2393/attachment/SC-1.tar.bz2) (2.5 MB; application/x-bzip) ###Environment details OS : Suse 64bit Changeset : 8701 ( 5.2.RC1) 2 nodes setup(1 controller and 1 payload) ###Summary Immd got crashed on Active as immnd restarted on Active with cluster having single controller and payload ###Steps followed & Observed behaviour 1. Bring up cluster wtih 1 controller and 1 payload 2. Kill immnd on active controller 3. Observed, that immd got crashed on Active controller(SC-1) due to which Payload also got rebooted ** Issue obserbed when there is only one controller ** **Syslog** SC-1::: Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: NO 'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' component restart probation timer started (timeout: 600 ns) Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: NO Restarting a component of 'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1) Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: NO 'safComp=IMMND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'componentRestart' Mar 23 11:06:12 SO-SLOT-1 osafsmfd[2235]: WA DispatchOiCallback: saImmOiDispatch() Fail 'SA_AIS_ERR_BAD_HANDLE (9)' Mar 23 11:06:12 SO-SLOT-1 osafntfimcnd[2181]: NO saImmOiDispatch() Fail SA_AIS_ERR_BAD_HANDLE (9) Mar 23 11:06:12 SO-SLOT-1 osafimmd[2138]: WA IMMND coordinator at 2010f apparently crashed => electing new coord Mar 23 11:06:12 SO-SLOT-1 osafimmd[2138]: ER Failed to find candidate for new IMMND coordinator (ScAbsenceAllowed:0 RulingEpoch:2 Mar 23 11:06:12 SO-SLOT-1 osafimmd[2138]: ER Active IMMD has to restart the IMMSv. All IMMNDs will restart Mar 23 11:06:12 SO-SLOT-1 osafimmd[2138]: ER IMM RELOAD with NO persistent back end => ensure cluster restart by IMMD exit at both SCs, exiting Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: NO 'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: ER safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast Mar 23 11:06:12 SO-SLOT-1 osafamfnd[2213]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131343, SupervisionTime = 60 Mar 23 11:06:12 SO-SLOT-1 opensaf_reboot: Rebooting local node; timeout=60 PL-3::: Mar 23 11:06:21 SO-SLOT-3 osafimmnd[2280]: ER IMMND forced to restart on order from IMMD, exiting Mar 23 11:06:21 SO-SLOT-3 osafamfnd[2290]: NO 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' component restart probation timer started (timeout: 600 ns) Mar 23 11:06:21 SO-SLOT-3 osafamfnd[2290]: NO Restarting a component of 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1) Mar 23 11:06:21 SO-SLOT-3 osafamfnd[2290]: NO 'safComp=IMMND,safSu=PL-3,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'componentRestart' Mar 23 11:06:21 SO-SLOT-3 osafimmnd[2755]: mkfifo already exists: /var/lib/opensaf/osafimmnd.fifo File exists Mar 23 11:06:21 SO-SLOT-3 osafimmnd[2755]: Started Mar 23 11:06:26 SO-SLOT-3 osafamfnd[2290]: WA AMF director unexpectedly crashed Mar 23 11:06:26 SO-SLOT-3 osafamfnd[2290]: Rebooting OpenSAF NodeId = 131855 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, OwnNodeId = 131855, SupervisionTime = 60 Traces: >From traces Active 'Failed to find candidate for new IMMND coordinator' and >Active IMMD has to restart the IMMSv ~~~ Mar 23 11:06:12.535325 osafimmd [2138:src/imm/immd/immd_evt.c:2638] T5 Received IMMND service event Mar 23 11:06:12.535349 osafimmd [2138:src/imm/immd/immd_evt.c:2741] T5 PROCESS MDS EVT: NCSMDS_DOWN, my PID:2138 Mar 23 11:06:12.535451 osafimmd [2138:src/imm/immd/immd_evt.c:2748] T5 NCSMDS_DOWN => local IMMND down Mar 23 11:06:12.535463 osafimmd [2138:src/imm/immd/immd_evt.c:2763] T5 IMMND DOWN PROCESS detected by IMMD Mar 23 11:06:12.535475 osafimmd [2138:src/imm/immd/immd_proc.c:0618] >> immd_process_immnd_down Mar 23 11:06:12.535483 osafimmd [2138:src/imm/immd/immd_proc.c:0621] T5 immd_process_immnd_down pid:2149 on-active:1 cb->immnd_coord:2010f Mar 23
[tickets] [opensaf:tickets] #2382 imm: reducing log level for ccb-committed messages
First, this ticket should not be a defect. The log level of the ccb commit messages is intentional, the motive being to have a record of if and when a CCB was committed. Second, having a record o configuration changes at hte OpensAF level is normally necesssary for analyzing a reproted problem involving OpenSAF. Many problems are triggered by a configuration change. Having a persistent record of such configuration changes is crucial for understanding or debugging unexpected events or problems, in a system. Such troubleshooting does not just cover troubleshooting of OpenSAF, but also troubleshooting of application level behavior when the configuration of such an application is changed. Log level NOtice is the lowest log level that is pushed to the syslog by default in OpenSAF. This ticket in fact goes further than just lowering the log level to INfo (which is normally not logged but can be toggled on), it argues for lowering it to trace! So you could end up in a scenario where there is a serious incident on a system, but no way to see from OpensAF logs if there was any configuration change involved in triggering the problem. You would need to reproduce the problem to get trace or INfo log level enabled. The problem with trace is that the volumes are so large that it somtimes impacts the bhavior of the system, simetimes making it difficult to reproduce the problem. CCB traffic is very low during normal operation. Only during SMF campaigns, or manual reconfigurations of the system would there be CCB traffic of any significance. So log messages of committed CCBs can hardly be a big issue in teerms of volume, in general. In summary: I argue that this ticket is not motivated and it is by definition not a defect since the current behavior is intentional and well motivated. The motive behind this ticket should be analyzed better and explained better in the ticket. Or the ticket may just be closed. A slightly better alternative is to introduce a new configuration parameter to specify if CCB commits are to be logged. The default of that configuration parameter must of course be OFF (currrent behavior the default). --- ** [tickets:#2382] imm: reducing log level for ccb-committed messages** **Status:** review **Milestone:** 5.0.2 **Created:** Thu Mar 16, 2017 09:26 AM UTC by Neelakanta Reddy **Last Updated:** Thu Mar 16, 2017 09:47 AM UTC **Owner:** Neelakanta Reddy if(i != sOwnerVector.end()) { LOG_NO("Ccb %u COMMITTED (%s)", ccb->mId, (*i)->mAdminOwnerName.c_str()); } else { LOG_NO("Ccb %u COMMITTED (%s)", ccb->mId, ""); } Reduce the LOG_NO to TRACE --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2284 IMM: Improper return code without any error string while deleting large number of objects
The slogan of this ticket and the analysis done in this ticket was missleading since the observed error really had nothing specifically to do to with deltion of objects, but rather with the setting of adminOwner over many objects. Most likely that confusion stems from observations done using the immcfg tool, which maps a tool level delete to more than one IMM API call. The slogan of the ticket could have been changed, but there is no way to delete an incorrect analysis. --- ** [tickets:#2284] IMM: Improper return code without any error string while deleting large number of objects** **Status:** invalid **Milestone:** 5.2.RC1 **Created:** Wed Feb 01, 2017 07:13 AM UTC by Chani Srivastava **Last Updated:** Fri Mar 10, 2017 06:17 AM UTC **Owner:** nobody Steps to reproduce: 1. Bring up opensaf on a cluster 2. Create around 10k objects 3. Try deleating these objects in one immcfg operation Output: Error Returned - error - saImmOmAdminOwnerSet FAILED: SA_AIS_ERR_LIBRARY (2) No error string stating the cause of failure is returned. Syslog - immcfg: ER TOO MANY Object Names line:733 Expected behavior - Proper return code with error string should be returned --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Announcing the Oxford Dictionaries API! The API offers world-renowned dictionary content that is easy and intuitive to access. Sign up for an account today to start using our lexical data to power your apps and projects. Get started today and enter our developer competition. http://sdm.link/oxford___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2284 IMM: Improper return code without any error string while deleting large number of objects
To summarize, the 10k limit is not per CCB but per adminOwnerSet call. The limit has nothing to do with avoiding "corruption in IMM". It simply has to do with the size of some messages being sent over the system. The ticket needs to be re-writen/re-defined. Best probably to close this one and maybe re-open a new ticket. I agree that ERR_LIBRARY is not the correct return code for this case. ERR_NO_RESOURCES would be better. Probably the documentation needs an update explaining the limit on the number of objects covered (explicitly, or implicitly by subtree recursion) for an admin-owner-set. An enhancement could in theory be defined to implement support for setting admin owner over larger number of objectst using one imm API call. But that use case is very rare outside of testing and a work arround should exist for the application (or immtools internally) to generate more than one admin-owner set call still within the same CCB. --- ** [tickets:#2284] IMM: Improper return code without any error string while deleting large number of objects** **Status:** unassigned **Milestone:** 5.2.RC1 **Created:** Wed Feb 01, 2017 07:13 AM UTC by Chani Srivastava **Last Updated:** Fri Mar 03, 2017 01:18 PM UTC **Owner:** nobody Steps to reproduce: 1. Bring up opensaf on a cluster 2. Create around 10k objects 3. Try deleating these objects in one immcfg operation Output: Error Returned - error - saImmOmAdminOwnerSet FAILED: SA_AIS_ERR_LIBRARY (2) No error string stating the cause of failure is returned. Syslog - immcfg: ER TOO MANY Object Names line:733 Expected behavior - Proper return code with error string should be returned --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2284 IMM: Improper return code without any error string while deleting large number of objects
Note also that it is adminOwnerSet that fails with ERR_LIBRARY saImmOmAdminOwnerSet FAILED: SA_AIS_ERR_LIBRARY (2 and not saImmOmCcbObjectDelete. --- ** [tickets:#2284] IMM: Improper return code without any error string while deleting large number of objects** **Status:** unassigned **Milestone:** 5.2.RC1 **Created:** Wed Feb 01, 2017 07:13 AM UTC by Chani Srivastava **Last Updated:** Fri Mar 03, 2017 01:11 PM UTC **Owner:** nobody Steps to reproduce: 1. Bring up opensaf on a cluster 2. Create around 10k objects 3. Try deleating these objects in one immcfg operation Output: Error Returned - error - saImmOmAdminOwnerSet FAILED: SA_AIS_ERR_LIBRARY (2) No error string stating the cause of failure is returned. Syslog - immcfg: ER TOO MANY Object Names line:733 Expected behavior - Proper return code with error string should be returned --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2284 IMM: Improper return code without any error string while deleting large number of objects
I am not aware of any size limitation for CCBs in IMM as such. Even if there was, exeeding it would result in some kind of explicit resource/timeout error for that case and absolutely not "database corruption". There IS a size limitation for the database total number of objects. If I remember correctly 300K objects of average size 300 bytes (?) its in the IMM_README. There may be (is probably) a limit on CCB size for the immcfg tool ? Most likely the problem observed here *is* due to some kind of library issue. Someone should check the imm library code for adminOwnerSet the cases that can return ERR_LIBRARY. --- ** [tickets:#2284] IMM: Improper return code without any error string while deleting large number of objects** **Status:** unassigned **Milestone:** 5.2.RC1 **Created:** Wed Feb 01, 2017 07:13 AM UTC by Chani Srivastava **Last Updated:** Wed Feb 01, 2017 09:02 AM UTC **Owner:** nobody Steps to reproduce: 1. Bring up opensaf on a cluster 2. Create around 10k objects 3. Try deleating these objects in one immcfg operation Output: Error Returned - error - saImmOmAdminOwnerSet FAILED: SA_AIS_ERR_LIBRARY (2) No error string stating the cause of failure is returned. Syslog - immcfg: ER TOO MANY Object Names line:733 Expected behavior - Proper return code with error string should be returned --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2323 imm: CCB operations fail after SC absence (Headless)
- **summary**: imm: CCB operations fail after SC absence --> imm: CCB operations fail after SC absence (Headless) - **Comment**: Added "headless" clarification because "AC absence" can be missunderstood as just one (out of normally two) SCs being absent. --- ** [tickets:#2323] imm: CCB operations fail after SC absence (Headless)** **Status:** review **Milestone:** 5.0.2 **Created:** Thu Feb 23, 2017 03:36 PM UTC by Hung Nguyen **Last Updated:** Wed Mar 01, 2017 07:04 AM UTC **Owner:** Hung Nguyen **Attachments:** - [logs_n_traces.tgz](https://sourceforge.net/p/opensaf/tickets/2323/attachment/logs_n_traces.tgz) (658.6 kB; application/gzip) Reproduce steps: ~~~ 1. Start SC-1 2. Commit some CCBs # immcfg -c Test test=0 # immcfg -c Test test=1 # immcfg -c Test test=2 # immcfg -c Test test=3 3. Start PL-3 4. Restart SC-1 5. When SC-1 is back, it fails to add operations to CCB # immcfg -c Test test=10 error - saImmOmCcbObjectCreate_2 FAILED with SA_AIS_ERR_FAILED_OPERATION (21) OI reports: IMM: Resource abort: CCB is not in an expected state error - saImmOmCcbApply FAILED: SA_AIS_ERR_FAILED_OPERATION (21) OI reports: IMM: Resource abort: CCB is not in an expected state ~~~ **cb->mLatestCcbId** was not updated on PL-3 when it joined the cluster so it still had value of zero. When SC-1 was back from headless, IMMND on PL-3 sent re-introduce message to IMMD on SC-1 with **cb->mLatestCcbId = 0**. IMMD failed to update **cb->ccb_id_count** so when new CCB is created, it will start from **0+1** instead of **mLatestCcbId + 1**. That results in the conflict with the CCB in **sCcbVector** and the CCB operation failure. Attached is logs and traces. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2229 imm:disable pbe should honor critical ccbs
- **Comment**: I have a several problems wiith this ticket. First, the problem description is both incorrect and incomplete. Point 7 is incorrect because the alternative and simplest way to clear the issue is to re-enable the PBE. It even says so in the pasted warning message in the ticket. Second, the description is incomplete because it does not describe the application (point 3) in any detail. It justs says run "multiple" ccb operations. The application types supported by IMM for CCBs are (a) operator initiated configuration changes and (b) operator initiatated management procedures (c) upgrade campaigns. Now both (a) and (b) are defined as being limited in size and time. Put in another way, if the configuration change is masssive, then it probably should go into a campaign. Or if the "configuration change" is some kind of high troughput continous ... h test (?), then that is not a valid test in itsef. The SAF imm serice is not designed to support high throughput applications. If you nevertheless insist on using some kind of automated continous CCb generting application (which by definition is not using te´he imm for storing just config data) then at the very least any upgrade campaign needs to be made awqare of the nonconformant application so that the campaign can quiesce the deviant application before starting the upgrade propper. But of course the improper application should not be there in he first place. A proposed "fix" has been sent for review. But as I understand that fix, it does not fix the problem. It only reduces the likelyhood of it persisting. Its a timer based solution. So the fix is an "enhancement" type fix for a problem that lies outside the scope of what the SAF IMM service is intended to support. It then possibly an "enhancement". But I would still argue that it is a "bad" enhancement since it dous not truly remove the problem and the invites missunderstanding/missuse of the imm service. --- ** [tickets:#2229] imm:disable pbe should honor critical ccbs** **Status:** review **Milestone:** 5.2.FC **Created:** Wed Dec 14, 2016 09:29 AM UTC by Neelakanta Reddy **Last Updated:** Wed Dec 14, 2016 09:47 AM UTC **Owner:** Neelakanta Reddy reproducible steps: 1. Bring up the cluster with PBE configured. 2. enable PBE 3. parallely run multiple ccb operations 4. disable PBE 5. in one of the payload/controller restart the immnd/node 6. sync wil be aboreted with following messages WA PBE has been disabled with ccbs in critical state - To resolve: Enable PBE or resart/reload the cluster NO Still waiting for existing Ccbs to terminate after 20.027520 seconds. Aborting this sync attempt 7. The IMMND will never get synced untill cluster restart The problem is observed, when the node is not joining in middleware upgrade, and evetually upgrade fails. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1747 IMMND trying to start PBE process while stopping OpenSAF services
- **Comment**: Instead of seing this as a minor defect, it could seen as being part of enhancement #56. There are a lot of "missleading" log messages when shutting down OpenSAF. The main reason is that OpenSAFs intended normal use is to never shut down (except during testing). --- ** [tickets:#1747] IMMND trying to start PBE process while stopping OpenSAF services** **Status:** unassigned **Milestone:** 5.0.RC2 **Created:** Mon Apr 11, 2016 10:30 AM UTC by Chani Srivastava **Last Updated:** Wed Apr 13, 2016 06:55 AM UTC **Owner:** nobody Setup: Changeset- 7436 Version - opensaf 5.0 1-PBE enabled Issue is not observed always. Apr 11 13:32:52 OSAF-SC1 opensafd: Stopping OpenSAF Services Apr 11 13:32:52 OSAF-SC1 osafamfnd[29960]: NO Shutdown initiated Apr 11 13:32:52 OSAF-SC1 osafamfnd[29960]: NO Terminating all AMF components Apr 11 13:32:52 OSAF-SC1 osafimmpbed: NO IMM PBE received SIG_TERM, closing db handle Apr 11 13:32:52 OSAF-SC1 osafimmpbed: IN IMM PBE process EXITING... Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: NO Implementer locally disconnected. Marking it as doomed 18 <545, 2010f> (OpenSafImmPBE) Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: NO Implementer disconnected 18 <545, 2010f> (OpenSafImmPBE) Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: WA Persistent back-end process has apparently died. Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: NO STARTING PBE process. Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: NO pbe-db-file-path:/home/chani/immPBE/imm.db VETERAN:1 B:0 Apr 11 13:32:53 OSAF-SC1 osafckptnd[30049]: exiting for shutdown Apr 11 13:32:53 OSAF-SC1 osafsmfd[29976]: exiting for shutdown Apr 11 13:32:53 OSAF-SC1 osaflckd[30057]: exiting for shutdown Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: NO Implementer locally disconnected. Marking it as doomed 2412 <321, 2010f> (safLckService) Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: NO Implementer disconnected 2412 <321, 2010f> (safLckService) Apr 11 13:32:53 OSAF-SC1 osaflcknd[30032]: exiting for shutdown Apr 11 13:32:53 OSAF-SC1 osafclmna[29860]: exiting for shutdown Apr 11 13:32:53 OSAF-SC1 osafimmd[29888]: exiting for shutdown Apr 11 13:32:53 OSAF-SC1 osaffmd[29878]: exiting for shutdown Apr 11 13:32:53 OSAF-SC1 osafrded[29869]: exiting for shutdown Apr 11 13:32:53 OSAF-SC1 osafevtd[30088]: exiting for shutdown Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: NO Implementer locally disconnected. Marking it as doomed 2413 <315, 2010f> (safEvtService) Apr 11 13:32:53 OSAF-SC1 osafckptd[30097]: exiting for shutdown Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: NO Implementer locally disconnected. Marking it as doomed 2411 <330, 2010f> (safCheckPointService) Apr 11 13:32:53 OSAF-SC1 osafimmnd[29899]: exiting for shutdown Apr 11 13:32:53 OSAF-SC1 osafmsgd[30011]: exiting for shutdown Apr 11 13:32:53 OSAF-SC1 osafmsgnd[29995]: exiting for shutdown Apr 11 13:32:53 OSAF-SC1 osafsmfnd[29978]: exiting for shutdown Apr 11 13:32:53 OSAF-SC1 osaflogd[29914]: exiting for shutdown Apr 11 13:32:53 OSAF-SC1 osafntfimcnd[5780]: NO saImmOiDispatch() Fail SA_AIS_ERR_BAD_HANDLE (9) Apr 11 13:32:53 OSAF-SC1 osafclmd[29940]: exiting for shutdown Apr 11 13:32:53 OSAF-SC1 osafimmpbed: IN arg[0] == '/usr/lib64/opensaf/osafimmpbed' Apr 11 13:32:53 OSAF-SC1 osafimmpbed: IN arg[1] == '--recover' Apr 11 13:32:53 OSAF-SC1 osafimmpbed: IN arg[2] == '--pbe' Apr 11 13:32:53 OSAF-SC1 osafimmpbed: IN arg[3] == '/home/chani/immPBE/imm.db' Apr 11 13:32:53 OSAF-SC1 osafimmpbed: ER osafimmpbe is not started by osafimmnd --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #48 IMM: Support for transactionally safe reads
- **status**: accepted --> assigned - **assigned_to**: Anders Bjornerstedt --> Zoran Milinkovic --- ** [tickets:#48] IMM: Support for transactionally safe reads** **Status:** assigned **Milestone:** 5.0.FC **Created:** Wed May 08, 2013 07:48 AM UTC by Anders Bjornerstedt **Last Updated:** Sun Nov 01, 2015 09:36 PM UTC **Owner:** Zoran Milinkovic Migrated from: http://devel.opensaf.org/ticket/3111 The Ccb concept as defined by the IMM SAF standard does not include any support for safe reads. That is, object reads that are protected and part of a ccb/transaction. The closest thing it has to safe reads is the admin-owner concept. By setting admin-owner for not just the objects to be changed, but also for objects included in the read-set of the ccb, the risk is reduced but not eliminated for the CCB being committed with an inconsistent read-set. The reason the risk is not eliminated is that concurrent CCBs are allowed under the same admin-owner. Another reason is that it is all too easy for applications to perform non-repeatable-reads using accessor-get or iterations, without remembering to set admin-owner over the read objects. I suspect that this is the rule rather than the exception. The 'read-set' for a ccb/transaction is the set of objects that the ccb/transaction needs to read (and have unchanged or only changed by the same ccb/transaction) untill the ccb/transaction terminates. The cardinal example would be an OI doing validation in hte completed callback. In general the OI needs to validate the changes not only within the limited context of the changed objects, but also relative to other objects that may not be changed by that specific transction. Currently all OIs would need to maintain inernal copies of all config data that they manage to acheive that. With safe-read this is no longer necessary. Some interrelated datamodels may also be managed by several OIs. The safe-read mechanism uses shaed locking allowing several OIs to safe-read access the same objects from different CCBs. This enhancement proposes to add an additional ccb related function for reading an object and associating that read with what is (or is equivalent to) a shared readlock. The OpenSAF IMM implementation already implements exclusive write locks for create/delete/modify operations in a ccb. Thus a Ccb that succeeds in invoking such a mutating operation will reserve exclusive write access to that object until the Ccb is terminated by commit or abort. The exclusivity is only in relation to other CCB operations (including safe reads). The accessor and iteration APIs still allow other processes to perform non repeatable reads, i.e. non transactional reads, i.e. unsafe reads, concurrently with an open CCB that is mutating such objects. Such unsafe reads are allowed without considering changes pending in on going CCBs or what admin-owner is set for the object. The new API that is proposed looks like this: saImmOmCcbObjectRead(SaImmCcbHandleT ccbHandle, SaConstStringT objectName, const SaImmAttrNameT *attributeNames, SaImmAttrValuesT_2 ***attributes); It has a signature very similar to saImmOmAccessorGet_2, with the difference only in taking a ccbHandle instead of an accessorHandle. The semantics of the API is identical to accessorGet with the exceptions that any returned config attributes are from the *latest* version of the object that is locked by this ccb. This operation will succeed unless the object is write locked by another Ccb. It will succeed if the object is not locked by other Ccbs or if it is only read-locked (shared) by other Ccbs. Another Ccb trying to write lock this object when this ccb has a shared read-lock will fail and have to wait at least until after this ccb is terminated. Another Ccb trying to read lock this object when this ccb has a shared read-lock will succeed and obtain a read-lock A safe read on an object that is already write locked by the same Ccb for create or modify will succeed, but not change the lock-type and will provide the current latest version of the object in the context of the CCB. A safe read on an object that is already write locked by the same Ccb for delete will fail with ERR_NOT_EXIST. Thus any modifications done to the object by this ccb but not yet committed, will be reflected in the result returned by the safe read call. All of the above should be recognized as pretty much standard transactional behavior for the OM API. What then about implementers and the OI API? After all, one typically important type of participant in a Ccb are the OIs performing validation of the CCb. Validation should normally include reading both data modified by the Ccb and reading data not modified by the ccb, but that still needs to be part of the read-set for the transaction, to commit without, violating integrity constraints. The proposal is for the OI to obtain a ccb-handle using the existing saImmOiAugmentCcbInitialize API. Then to use the
[tickets] [opensaf:tickets] #1554 imm: validation abort should have precedence over resource abort
- Description has changed: Diff: --- old +++ new @@ -1,4 +1,4 @@ -When CCB is applied, the CCB may receive multiple error strings from more OIs. -Ticket #744 implemented validation/resource abort error strings in the way that the precedence has the first receiving error string. Other validation/resource abort will be ignored. +When CCB is applied, the CCB may receive multiple error strings from several OIs. +Ticket #744 implemented validation/resource abort error strings in the way that precedence was given the first received error string. Subsequent strings where ignored. Validation abort reason is more significant than resource abort, and it must override resource abort error string. --- ** [tickets:#1554] imm: validation abort should have precedence over resource abort** **Status:** accepted **Milestone:** 4.7.RC1 **Created:** Wed Oct 21, 2015 10:38 AM UTC by Zoran Milinkovic **Last Updated:** Wed Oct 21, 2015 10:42 AM UTC **Owner:** Zoran Milinkovic When CCB is applied, the CCB may receive multiple error strings from several OIs. Ticket #744 implemented validation/resource abort error strings in the way that precedence was given the first received error string. Subsequent strings where ignored. Validation abort reason is more significant than resource abort, and it must override resource abort error string. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1514 Opensaf on payload failed to come up and IMMD on active controller faulted
I see this as a duplicate of #1291, which is closed as invalid. The basic problem is communication overload. The only available current solution for deployments that see this issue is to reduce the value for the opensafImmSyncBatchSize config attribute in the OpensAF IMM service object: opensafImm=opensafImm,safApp=safImmService Beyond this, there are various enhancements, in MDS or OpenSAF that could potentially reduce the risk of communication overload. --- ** [tickets:#1514] Opensaf on payload failed to come up and IMMD on active controller faulted** **Status:** assigned **Milestone:** 4.7.RC1 **Created:** Mon Oct 05, 2015 10:03 AM UTC by Ritu Raj **Last Updated:** Wed Oct 07, 2015 12:37 PM UTC **Owner:** A V Mahesh (AVM) **Attachments:** - [1513.tgz](https://sourceforge.net/p/opensaf/tickets/1514/attachment/1513.tgz) (7.1 MB; application/x-compressed-tar) Setup: Changeset- 6901 4 nodes configured with single PBE and a load of 30K objects Issue observed * Payload failed to join the cluster and later active controller rebooted Steps performed: * Started OpenSAF on the controller SC-1 and SC-1 took the active role . Oct 5 12:33:31 SLES-64BIT-SLOT1 osafrded[3129]: NO No peer available => Setting Active role for this node Later, started opensaf on slot-2, for which opensafd failed because of the disk size full. Resolved the issue and restarted the opensaf on slot-2, which ensured that both the nodes joined the cluster. Oct 5 12:45:34 SLES-32BIT-SLOT2 osafrded[15186]: NO Peer rde@2010f has active state => Assigning Standby role to this node * After controllers formed the cluster, later started opensaf on the remaining two payloads at same time. * PL-3 joined the cluster successfully. * Oct 5 13:03:19 SLES-64BIT-SLOT3 kernel: [495958.582544] TIPC: Own node address <1.1.3>, network identity 5234 Oct 5 13:09:34 SLES-64BIT-SLOT3 osafimmnd[15392]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 17601 Oct 5 13:09:34 SLES-64BIT-SLOT3 osafimmnd[15392]: NO Epoch set to 125 in ImmModel Oct 5 13:09:35 SLES-64BIT-SLOT3 osafimmnd[15392]: NO Implementer (applier) connected: 27 (@OpenSafImmReplicatorB) <0, 2010f> * PL-4 failed to join the cluster, Oct 5 13:03:38 SLES-32BIT-SLOT4 kernel: [436326.659526] TIPC: Own node address <1.1.4>, network identity 5234 Oct 5 13:03:38 SLES-32BIT-SLOT4 osafimmnd[8781]: NO Persistent Back-End capability configured, Pbe file:imm.db (suffix may get added) Oct 5 13:03:38 SLES-32BIT-SLOT4 osafimmnd[8781]: NO SERVER STATE: IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING Oct 5 13:03:43 SLES-32BIT-SLOT4 osafimmnd[8781]: WA Resending introduce-me - problems with MDS ? 5 Oct 5 13:03:43 SLES-32BIT-SLOT4 osafimmnd[8781]: WA Resending introduce-me - problems with MDS ? 5 ... Oct 5 13:04:28 SLES-32BIT-SLOT4 osafimmnd[8781]: WA Resending introduce-me - problems with MDS ? 50 Oct 5 13:04:29 SLES-32BIT-SLOT4 osafimmnd[8781]: ER Failed to load/sync. Giving up after 51 seconds, restarting.. Oct 5 13:04:29 SLES-32BIT-SLOT4 opensafd[8736]: ER Failed DESC:IMMND Oct 5 13:04:29 SLES-32BIT-SLOT4 opensafd[8736]: ER Going for recovery ...Oct 5 13:06:41 SLES-32BIT-SLOT4 osafimmnd[8856]: ER Failed to load/sync. Giving up after 51 seconds, restarting.. Oct 5 13:06:41 SLES-32BIT-SLOT4 opensafd[8736]: ER Could Not RESPAWN IMMND Oct 5 13:06:41 SLES-32BIT-SLOT4 opensafd[8736]: ER Failed DESC:IMMND Oct 5 13:06:41 SLES-32BIT-SLOT4 opensafd[8736]: ER FAILED TO RESPAWN Oct 5 13:06:41 SLES-32BIT-SLOT4 osafimmnd[8856]: ER IMMND - Periodic server job failed Oct 5 13:06:41 SLES-32BIT-SLOT4 osafimmnd[8856]: ER Failed, exiting... Oct 5 13:06:41 SLES-32BIT-SLOT4 kernel: [436509.187946] TIPC: Disabling bearer * After the opensafd failed to come up on PL-4, SC-1 rebooted with IMMD exiting. Oct 5 13:08:52 SLES-64BIT-SLOT1 osafimmnd[3163]: NO Coord broadcasting PBE_PRTO_PURGE_MUTATIONS, epoch:123 Oct 5 13:08:53 SLES-64BIT-SLOT1 osafimmnd[3163]: NO ImmModel::getPbeOi reports missing PbeOi locally => unsafe Oct 5 13:08:53 SLES-64BIT-SLOT1 osafimmnd[3163]: NO Coord broadcasting PBE_PRTO_PURGE_MUTATIONS, epoch:123 Oct 5 13:08:53 SLES-64BIT-SLOT1 osafamfnd[3239]: NO SU failover probation timer started (timeout: 12000 ns) Oct 5 13:08:53 SLES-64BIT-SLOT1 osafamfnd[3239]: NO Performing failover of 'safSu=SC-1,safSg=2N,safApp=OpenSAF' (SU failover count: 1) Oct 5 13:08:53 SLES-64BIT-SLOT1 osafamfnd[3239]: NO 'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' recovery action escalated from 'componentFailover' to 'suFailover' Oct 5 13:08:53 SLES-64BIT-SLOT1 osafamfnd[3239]: NO 'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'healthCheckcallbackTimeout' : Recovery is 'suFailover' Oct 5 13:08:53 SLES-64BIT-SLOT1 osafamfnd[3239]: ER safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:healthCheckcallbackTimeout Recovery is:suFailover * PL-4 joined the cluster, after opensafd is started on PL-4 after some
[tickets] [opensaf:tickets] #1526 imm: 1PBE can see db as locked
- **status**: review --> accepted - **Comment**: I nack'ed the patch because the imm service already has a restart mechanism for the PBE if it gets stuck and the symptom shown here must result from a bug (if this truly is on 1PBE). If there is not enough information to locate the bug, then the problem needs to be reproduced with trace. If it can not be reproduced then we close the ticket as not reproducible. --- ** [tickets:#1526] imm: 1PBE can see db as locked** **Status:** accepted **Milestone:** 4.5.2 **Created:** Wed Oct 07, 2015 09:43 AM UTC by Neelakanta Reddy **Last Updated:** Thu Oct 08, 2015 07:58 AM UTC **Owner:** Neelakanta Reddy when the disk is full the sqlite will return error. Sep 18 13:42:02 SC-2 osafimmpbed: ER SQL statement ('COMMIT TRANSACTION') failed because: disk I/O error Sep 18 13:42:02 SC-2 osafimmnd[13067]: NO Invalid error reported implementer 'OpenSafImmPBE', Ccb 321 will be aborted Sep 18 13:42:02 SC-2 osafimmnd[13067]: NO Ccb 321 ABORTED (TraceC) Sep 18 13:42:02 SC-2 osafimmpbed: WA Failed to find CCB object for 141/321 Due to continoues CCB operations (even though disk is full) the 1PBE is seeing the following mesages for more than 3 hours: messages:Sep 18 17:58:46 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages:Sep 18 17:58:46 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages:Sep 18 17:58:47 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages:Sep 18 17:58:47 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages.7:Sep 18 14:22:22 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages.7:Sep 18 14:22:23 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages.7:Sep 18 14:22:23 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages.7:Sep 18 14:22:24 SC-2 osafimmpbed: WA Sqlite db locked by other thread After freeing the space still the PBE is got struck in Sqlite db locked by other thread. This is preventing any further operations. once the PBE is killed, the imm.db re-generated and the CCB operations are applied. Solution(1PBE): For the 1PBE case, which is not multi threaded, if the sqlite db locked case is reached abort the PBE and let the PBE be re-generated(instead of blocking the PBE process). --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1526 imm: exit the 1PBE when pbeBeginTrans sees db as locked
Question: How can this case happen for the 1PBE case when there is only one user thread using the sqlite instance ? Another relevant question is why/when do you observe this now ? The test case or test setup must be special somehow. With only one thread this case should be impossible. It suggest heap correuption could be the cause. Some years ago we did see problems although not exactly this kind, in conjunction with repeated failovers, where the new PBE managed to start while the old PBE (on the other SC) was still executing (slow to terminate). But the distributes file level protection uses file system locking and the symptoms should be different. --- ** [tickets:#1526] imm: exit the 1PBE when pbeBeginTrans sees db as locked** **Status:** review **Milestone:** 4.5.2 **Created:** Wed Oct 07, 2015 09:43 AM UTC by Neelakanta Reddy **Last Updated:** Thu Oct 08, 2015 07:21 AM UTC **Owner:** Neelakanta Reddy when the disk is full the sqlite will return error. Sep 18 13:42:02 SC-2 osafimmpbed: ER SQL statement ('COMMIT TRANSACTION') failed because: disk I/O error Sep 18 13:42:02 SC-2 osafimmnd[13067]: NO Invalid error reported implementer 'OpenSafImmPBE', Ccb 321 will be aborted Sep 18 13:42:02 SC-2 osafimmnd[13067]: NO Ccb 321 ABORTED (TraceC) Sep 18 13:42:02 SC-2 osafimmpbed: WA Failed to find CCB object for 141/321 Due to continoues CCB operations (even though disk is full) the 1PBE is seeing the following mesages for more than 3 hours: messages:Sep 18 17:58:46 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages:Sep 18 17:58:46 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages:Sep 18 17:58:47 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages:Sep 18 17:58:47 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages.7:Sep 18 14:22:22 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages.7:Sep 18 14:22:23 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages.7:Sep 18 14:22:23 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages.7:Sep 18 14:22:24 SC-2 osafimmpbed: WA Sqlite db locked by other thread After freeing the space still the PBE is got struck in Sqlite db locked by other thread. This is preventing any further operations. once the PBE is killed, the imm.db re-generated and the CCB operations are applied. Solution(1PBE): For the 1PBE case, which is not multi threaded, if the sqlite db locked case is reached abort the PBE and let the PBE be re-generated(instead of blocking the PBE process). --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1526 imm: exit the 1PBE when pbeBeginTrans sees db as locked
I looked at the code and the error message is correct but the "lock" is the PBE "spin lock" created for handling 2PBE. The fact that it finds it locked in 1PBE means there is a logical bug somewhere in 1PBE. Most likely some error case where there is a bailout from commit processing without correct cleanup. --- ** [tickets:#1526] imm: exit the 1PBE when pbeBeginTrans sees db as locked** **Status:** review **Milestone:** 4.5.2 **Created:** Wed Oct 07, 2015 09:43 AM UTC by Neelakanta Reddy **Last Updated:** Thu Oct 08, 2015 07:43 AM UTC **Owner:** Neelakanta Reddy when the disk is full the sqlite will return error. Sep 18 13:42:02 SC-2 osafimmpbed: ER SQL statement ('COMMIT TRANSACTION') failed because: disk I/O error Sep 18 13:42:02 SC-2 osafimmnd[13067]: NO Invalid error reported implementer 'OpenSafImmPBE', Ccb 321 will be aborted Sep 18 13:42:02 SC-2 osafimmnd[13067]: NO Ccb 321 ABORTED (TraceC) Sep 18 13:42:02 SC-2 osafimmpbed: WA Failed to find CCB object for 141/321 Due to continoues CCB operations (even though disk is full) the 1PBE is seeing the following mesages for more than 3 hours: messages:Sep 18 17:58:46 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages:Sep 18 17:58:46 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages:Sep 18 17:58:47 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages:Sep 18 17:58:47 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages.7:Sep 18 14:22:22 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages.7:Sep 18 14:22:23 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages.7:Sep 18 14:22:23 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages.7:Sep 18 14:22:24 SC-2 osafimmpbed: WA Sqlite db locked by other thread After freeing the space still the PBE is got struck in Sqlite db locked by other thread. This is preventing any further operations. once the PBE is killed, the imm.db re-generated and the CCB operations are applied. Solution(1PBE): For the 1PBE case, which is not multi threaded, if the sqlite db locked case is reached abort the PBE and let the PBE be re-generated(instead of blocking the PBE process). --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1526 imm: exit the 1PBE when pbeBeginTrans sees db as locked
I guess it could be that the pbe level message "Sqlite db locked by other thread" is plain wrong, i.e. missleading. --- ** [tickets:#1526] imm: exit the 1PBE when pbeBeginTrans sees db as locked** **Status:** review **Milestone:** 4.5.2 **Created:** Wed Oct 07, 2015 09:43 AM UTC by Neelakanta Reddy **Last Updated:** Thu Oct 08, 2015 07:35 AM UTC **Owner:** Neelakanta Reddy when the disk is full the sqlite will return error. Sep 18 13:42:02 SC-2 osafimmpbed: ER SQL statement ('COMMIT TRANSACTION') failed because: disk I/O error Sep 18 13:42:02 SC-2 osafimmnd[13067]: NO Invalid error reported implementer 'OpenSafImmPBE', Ccb 321 will be aborted Sep 18 13:42:02 SC-2 osafimmnd[13067]: NO Ccb 321 ABORTED (TraceC) Sep 18 13:42:02 SC-2 osafimmpbed: WA Failed to find CCB object for 141/321 Due to continoues CCB operations (even though disk is full) the 1PBE is seeing the following mesages for more than 3 hours: messages:Sep 18 17:58:46 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages:Sep 18 17:58:46 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages:Sep 18 17:58:47 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages:Sep 18 17:58:47 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages.7:Sep 18 14:22:22 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages.7:Sep 18 14:22:23 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages.7:Sep 18 14:22:23 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages.7:Sep 18 14:22:24 SC-2 osafimmpbed: WA Sqlite db locked by other thread After freeing the space still the PBE is got struck in Sqlite db locked by other thread. This is preventing any further operations. once the PBE is killed, the imm.db re-generated and the CCB operations are applied. Solution(1PBE): For the 1PBE case, which is not multi threaded, if the sqlite db locked case is reached abort the PBE and let the PBE be re-generated(instead of blocking the PBE process). --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1526 imm: 1PBE can see db as locked
- **summary**: imm: exit the 1PBE when pbeBeginTrans sees db as locked --> imm: 1PBE can see db as locked - **Comment**: Changed ticket slogan to describe the problem. --- ** [tickets:#1526] imm: 1PBE can see db as locked** **Status:** review **Milestone:** 4.5.2 **Created:** Wed Oct 07, 2015 09:43 AM UTC by Neelakanta Reddy **Last Updated:** Thu Oct 08, 2015 07:55 AM UTC **Owner:** Neelakanta Reddy when the disk is full the sqlite will return error. Sep 18 13:42:02 SC-2 osafimmpbed: ER SQL statement ('COMMIT TRANSACTION') failed because: disk I/O error Sep 18 13:42:02 SC-2 osafimmnd[13067]: NO Invalid error reported implementer 'OpenSafImmPBE', Ccb 321 will be aborted Sep 18 13:42:02 SC-2 osafimmnd[13067]: NO Ccb 321 ABORTED (TraceC) Sep 18 13:42:02 SC-2 osafimmpbed: WA Failed to find CCB object for 141/321 Due to continoues CCB operations (even though disk is full) the 1PBE is seeing the following mesages for more than 3 hours: messages:Sep 18 17:58:46 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages:Sep 18 17:58:46 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages:Sep 18 17:58:47 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages:Sep 18 17:58:47 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages.7:Sep 18 14:22:22 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages.7:Sep 18 14:22:23 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages.7:Sep 18 14:22:23 SC-2 osafimmpbed: WA Sqlite db locked by other thread. messages.7:Sep 18 14:22:24 SC-2 osafimmpbed: WA Sqlite db locked by other thread After freeing the space still the PBE is got struck in Sqlite db locked by other thread. This is preventing any further operations. once the PBE is killed, the imm.db re-generated and the CCB operations are applied. Solution(1PBE): For the 1PBE case, which is not multi threaded, if the sqlite db locked case is reached abort the PBE and let the PBE be re-generated(instead of blocking the PBE process). --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1499 IMM: Update immsv/README describing imm enhancements in 4.7
- **status**: review --> fixed - **Comment**: changeset: 6979:2a5befe801cf tag: tip parent: 6977:93c7269c4797 user: Anders Bjornerstedt <anders.bjornerst...@ericsson.com> date:Wed Oct 07 12:42:52 2015 +0200 summary: IMM: Update immsv/README describing imm enhancements in 4.7 [#1499] changeset: 6978:46eae48ebfba branch: opensaf-4.7.x parent: 6976:1736dee70266 user: Anders Bjornerstedt <anders.bjornerst...@ericsson.com> date:Wed Oct 07 12:42:52 2015 +0200 summary: IMM: Update immsv/README describing imm enhancements in 4.7 [#1499] --- ** [tickets:#1499] IMM: Update immsv/README describing imm enhancements in 4.7** **Status:** fixed **Milestone:** 4.7.RC1 **Created:** Thu Sep 24, 2015 10:55 AM UTC by Anders Bjornerstedt **Last Updated:** Mon Oct 05, 2015 08:27 AM UTC **Owner:** Anders Bjornerstedt --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Full-scale, agent-less Infrastructure Monitoring from a single dashboard Integrate with 40+ ManageEngine ITSM Solutions for complete visibility Physical-Virtual-Cloud Infrastructure monitoring from one console Real user monitoring with APM Insights and performance trend reports Learn More http://pubads.g.doubleclick.net/gampad/clk?id=247754911=/4140___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1499 IMM: Update immsv/README describing imm enhancements in 4.7
- **status**: review --> accepted --- ** [tickets:#1499] IMM: Update immsv/README describing imm enhancements in 4.7** **Status:** accepted **Milestone:** 4.7.RC1 **Created:** Thu Sep 24, 2015 10:55 AM UTC by Anders Bjornerstedt **Last Updated:** Fri Oct 02, 2015 11:04 AM UTC **Owner:** Anders Bjornerstedt --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1499 IMM: Update immsv/README describing imm enhancements in 4.7
- **status**: accepted --> review --- ** [tickets:#1499] IMM: Update immsv/README describing imm enhancements in 4.7** **Status:** review **Milestone:** 4.7.RC1 **Created:** Thu Sep 24, 2015 10:55 AM UTC by Anders Bjornerstedt **Last Updated:** Mon Oct 05, 2015 08:27 AM UTC **Owner:** Anders Bjornerstedt --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1503 IMM: Augumented CCb client went down the OM client should get err
The is also a timeout on the OI callback (the create callback harboring the Augmentation). Normally the OI timeout is shorter than the IMMA_SYNCR_TIMEOUT and so normally the OM client should get an error on the ccb-create downcall before timeout. But if the OM lcient has erduced the syncr timeout or the OI has increased its OI timeout then you could end up getting ERR_TIMEOUT on the OI side. This without anything being wrong anywhere. --- ** [tickets:#1503] IMM: Augumented CCb client went down the OM client should get err** **Status:** assigned **Milestone:** 4.5.2 **Created:** Fri Sep 25, 2015 09:18 AM UTC by Neelakanta Reddy **Last Updated:** Fri Sep 25, 2015 09:18 AM UTC **Owner:** Neelakanta Reddy OM on node1 and OI on node2. OM creates an object. In OI augument by creating an object and the OI client goes down. The CCb get aborted in IMM database.But the OM create API will not get return value and after SYNC_TIMEOUT OM API receives TIME_OUT. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1503 IMM: Augumented CCb client went down the OM client should get err
Yes I forgot that the OI callback timeout gets disabled by a ccb augmentation inside the callback. --- ** [tickets:#1503] IMM: Augumented CCb client went down the OM client should get err** **Status:** assigned **Milestone:** 4.5.2 **Created:** Fri Sep 25, 2015 09:18 AM UTC by Neelakanta Reddy **Last Updated:** Mon Oct 05, 2015 10:09 AM UTC **Owner:** Neelakanta Reddy OM on node1 and OI on node2. OM creates an object. In OI augument by creating an object and the OI client goes down. The CCb get aborted in IMM database.But the OM create API will not get return value and after SYNC_TIMEOUT OM API receives TIME_OUT. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1499 IMM: Update immsv/README describing imm enhancements in 4.7
- **status**: accepted --> review --- ** [tickets:#1499] IMM: Update immsv/README describing imm enhancements in 4.7** **Status:** review **Milestone:** 4.7.RC1 **Created:** Thu Sep 24, 2015 10:55 AM UTC by Anders Bjornerstedt **Last Updated:** Thu Oct 01, 2015 02:37 PM UTC **Owner:** Anders Bjornerstedt --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1425 IMM: Add attribute def flag SA_IMM_ATTR_STRONG_DEFAULT
- **status**: assigned --> unassigned - **Milestone**: future --> 5.0 --- ** [tickets:#1425] IMM: Add attribute def flag SA_IMM_ATTR_STRONG_DEFAULT** **Status:** unassigned **Milestone:** 5.0 **Created:** Fri Jul 24, 2015 12:49 PM UTC by Anders Bjornerstedt **Last Updated:** Wed Sep 23, 2015 04:23 PM UTC **Owner:** Hung Nguyen The saImmOmClassCreate_2() API allows the user to provide a list of attribute definitions. An attribute definition may include a default value. The default value will be assigned to this attribute in an instance being created by the saImmOmCcbObjectCreate_2() or the saImmOiRtObjectCreate_2() APIs, if the user does not provide a value for that attribute. But a user/OI may later update such an object/attribute assigning the empty value to the attribute. So the default value mechanism is only effective for object creation and not later in the life cycle of the object. This makes the default attribute value mechanism weaker than some users would like. This enhancement proposes a new attribute flag SA_IMM_ATTR_STRONG_DEFAULT. This flag will only be allowed to be set on an attribute definition that includes a default value. The meaning of the flag is that if a user attempts an update of an object/attribute that assigns the empty value to such an attribute, then the IMM will replace, i.e. override, that value with the default value defined in the class. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1425 IMM: Add attribute def flag SA_IMM_ATTR_STRONG_DEFAULT
- **status**: unassigned --> accepted --- ** [tickets:#1425] IMM: Add attribute def flag SA_IMM_ATTR_STRONG_DEFAULT** **Status:** accepted **Milestone:** 5.0 **Created:** Fri Jul 24, 2015 12:49 PM UTC by Anders Bjornerstedt **Last Updated:** Thu Oct 01, 2015 10:38 AM UTC **Owner:** Hung Nguyen The saImmOmClassCreate_2() API allows the user to provide a list of attribute definitions. An attribute definition may include a default value. The default value will be assigned to this attribute in an instance being created by the saImmOmCcbObjectCreate_2() or the saImmOiRtObjectCreate_2() APIs, if the user does not provide a value for that attribute. But a user/OI may later update such an object/attribute assigning the empty value to the attribute. So the default value mechanism is only effective for object creation and not later in the life cycle of the object. This makes the default attribute value mechanism weaker than some users would like. This enhancement proposes a new attribute flag SA_IMM_ATTR_STRONG_DEFAULT. This flag will only be allowed to be set on an attribute definition that includes a default value. The meaning of the flag is that if a user attempts an update of an object/attribute that assigns the empty value to such an attribute, then the IMM will replace, i.e. override, that value with the default value defined in the class. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1504 imm: Appliers for classes and objects are not synced to sync-client
- **Comment**: I have a problem with this ticket. Appliers are intentionally not synced. They should not need to be synced. The question here is how you manage to execute a sync with a ccb being active. Non empty Ccbs are terminated before the actual sync can start. So there seems to have been introduced a bug somewhere. --- ** [tickets:#1504] imm: Appliers for classes and objects are not synced to sync-client** **Status:** assigned **Milestone:** 4.5.2 **Created:** Mon Sep 28, 2015 04:15 AM UTC by Hung Nguyen **Last Updated:** Mon Sep 28, 2015 04:27 AM UTC **Owner:** Hung Nguyen Set an applier to a class. Then exit immapplier to detach the applier. root@SC1:~# immapplier -a @whatever Test Let another node join the cluster. Create a CCB which is active on an object of 'Test' class. Don't commit the CCB. root@SC1:~# immcfg > immcfg -c Test test=1 > Try to set applier again. root@SC1:/srv/shared# immapplier -a @whatever Test Implementer: @whatever ImmVersion: A 2 16 error - saImmOiImplementerSet FAILED: SA_AIS_ERR_TRY_AGAIN (6) SC-1 osafimmnd [419:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) from 0 osafimmnd [419:immnd_evt.c:9527] T2 originated here?:1 nodeId:2010f conn: 225 osafimmnd [419:ImmModel.cc:12967] >> implementerSet osafimmnd [419:ImmModel.cc:13008] T7 Re-using implementer for @whatever osafimmnd [419:ImmModel.cc:13040] TR TRY_AGAIN: ccb 2 is active on object 'test=1' bound to class applier '@whatever'. Can not re-attach applier osafimmnd [419:ImmModel.cc:13156] << implementerSet PL-3 osafimmnd [392:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) from 0 osafimmnd [392:immnd_evt.c:9527] T2 originated here?:0 nodeId:2010f conn: 225 osafimmnd [392:ImmModel.cc:12967] >> implementerSet osafimmnd [392:ImmModel.cc:13008] T7 Re-using implementer for @whatever osafimmnd [392:ImmModel.cc:13087] NO Implementer (applier) connected: 5 (@whatever) <0, 2010f> osafimmnd [392:ImmModel.cc:13156] << implementerSet IMMND on SC-1 rejected the implSet request but IMMND on PL-3 accepted it. The applier was not synced to PL-3 (mAppliers.empty() returned true) so the implSet request passed the ccb check. if( ! obj->mClassInfo->mAppliers.empty()) { ImplementerSet::iterator ii = obj->mClassInfo->mAppliers.begin(); for(; ii != obj->mClassInfo->mAppliers.end(); ++ii) { if((*ii) == info) { TRACE("TRY_AGAIN: ccb %u is active on object '%s' " "bound to class applier '%s'. Can not re-attach applier", ccb->mId, omit->first.c_str(), implName.c_str()); err = SA_AIS_ERR_TRY_AGAIN; goto done; } } } Now commit the CCB and try to set the applier again. SC-1 osafimmnd [419:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) from 0 osafimmnd [419:immnd_evt.c:9527] T2 originated here?:1 nodeId:2010f conn: 226 osafimmnd [419:ImmModel.cc:12967] >> implementerSet osafimmnd [419:ImmModel.cc:13008] T7 Re-using implementer for @whatever osafimmnd [419:ImmModel.cc:13087] NO Implementer (applier) connected: 6 (@whatever) <226, 2010f> osafimmnd [419:ImmModel.cc:13156] << implementerSet PL-3 osafimmnd [392:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) from 0 osafimmnd [392:immnd_evt.c:9527] T2 originated here?:0 nodeId:2010f conn: 226 osafimmnd [392:ImmModel.cc:12967] >> implementerSet osafimmnd [392:ImmModel.cc:13003] T7 ERR_EXIST: Registered implementer already exists: @whatever osafimmnd [392:ImmModel.cc:13005] << implementerSet The applier had diferent ids on SC-1 and PL-3. When a new node joins the cluster, IMMND on PL-3 will crash when verifying the implementers. PL3 osafimmnd[392]: ER Sync-verify: Established node has different Implementer-id: 5 for name: @whatever, sync says 6. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1504 imm: Appliers for classes and objects are not synced to sync-client
With the above solution there is the issue that the check is then not done in fevs order. By the time the implementer-set arrives over fevs at all nodes, there may have been creaed a ccb-operation that interferes, resulting in the implementer-set having to be aborted anyway. The local immnd thus has to run the applier checks again in the receiving fevs for implementer-set. If that check fails, it rejects the operation, replies with error to the client and broadcast an implementer_clear over fevs. --- ** [tickets:#1504] imm: Appliers for classes and objects are not synced to sync-client** **Status:** assigned **Milestone:** 4.5.2 **Created:** Mon Sep 28, 2015 04:15 AM UTC by Hung Nguyen **Last Updated:** Mon Sep 28, 2015 09:26 AM UTC **Owner:** Hung Nguyen Set an applier to a class. Then exit immapplier to detach the applier. root@SC1:~# immapplier -a @whatever Test Let another node join the cluster. Create a CCB which is active on an object of 'Test' class. Don't commit the CCB. root@SC1:~# immcfg > immcfg -c Test test=1 > Try to set applier again. root@SC1:/srv/shared# immapplier -a @whatever Test Implementer: @whatever ImmVersion: A 2 16 error - saImmOiImplementerSet FAILED: SA_AIS_ERR_TRY_AGAIN (6) SC-1 osafimmnd [419:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) from 0 osafimmnd [419:immnd_evt.c:9527] T2 originated here?:1 nodeId:2010f conn: 225 osafimmnd [419:ImmModel.cc:12967] >> implementerSet osafimmnd [419:ImmModel.cc:13008] T7 Re-using implementer for @whatever osafimmnd [419:ImmModel.cc:13040] TR TRY_AGAIN: ccb 2 is active on object 'test=1' bound to class applier '@whatever'. Can not re-attach applier osafimmnd [419:ImmModel.cc:13156] << implementerSet PL-3 osafimmnd [392:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) from 0 osafimmnd [392:immnd_evt.c:9527] T2 originated here?:0 nodeId:2010f conn: 225 osafimmnd [392:ImmModel.cc:12967] >> implementerSet osafimmnd [392:ImmModel.cc:13008] T7 Re-using implementer for @whatever osafimmnd [392:ImmModel.cc:13087] NO Implementer (applier) connected: 5 (@whatever) <0, 2010f> osafimmnd [392:ImmModel.cc:13156] << implementerSet IMMND on SC-1 rejected the implSet request but IMMND on PL-3 accepted it. The applier was not synced to PL-3 (mAppliers.empty() returned true) so the implSet request passed the ccb check. if( ! obj->mClassInfo->mAppliers.empty()) { ImplementerSet::iterator ii = obj->mClassInfo->mAppliers.begin(); for(; ii != obj->mClassInfo->mAppliers.end(); ++ii) { if((*ii) == info) { TRACE("TRY_AGAIN: ccb %u is active on object '%s' " "bound to class applier '%s'. Can not re-attach applier", ccb->mId, omit->first.c_str(), implName.c_str()); err = SA_AIS_ERR_TRY_AGAIN; goto done; } } } Now commit the CCB and try to set the applier again. SC-1 osafimmnd [419:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) from 0 osafimmnd [419:immnd_evt.c:9527] T2 originated here?:1 nodeId:2010f conn: 226 osafimmnd [419:ImmModel.cc:12967] >> implementerSet osafimmnd [419:ImmModel.cc:13008] T7 Re-using implementer for @whatever osafimmnd [419:ImmModel.cc:13087] NO Implementer (applier) connected: 6 (@whatever) <226, 2010f> osafimmnd [419:ImmModel.cc:13156] << implementerSet PL-3 osafimmnd [392:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) from 0 osafimmnd [392:immnd_evt.c:9527] T2 originated here?:0 nodeId:2010f conn: 226 osafimmnd [392:ImmModel.cc:12967] >> implementerSet osafimmnd [392:ImmModel.cc:13003] T7 ERR_EXIST: Registered implementer already exists: @whatever osafimmnd [392:ImmModel.cc:13005] << implementerSet The applier had diferent ids on SC-1 and PL-3. When a new node joins the cluster, IMMND on PL-3 will crash when verifying the implementers. PL3 osafimmnd[392]: ER Sync-verify: Established node has different Implementer-id: 5 for name: @whatever, sync says 6. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1504 imm: Implicit class/object-applier checked by OiImplementeSet is incorrect
- **summary**: imm: Appliers for classes and objects are not synced to sync-client --> imm: Implicit class/object-applier checked by OiImplementeSet is incorrect --- ** [tickets:#1504] imm: Implicit class/object-applier checked by OiImplementeSet is incorrect** **Status:** assigned **Milestone:** 4.5.2 **Created:** Mon Sep 28, 2015 04:15 AM UTC by Hung Nguyen **Last Updated:** Mon Sep 28, 2015 09:38 AM UTC **Owner:** Hung Nguyen Set an applier to a class. Then exit immapplier to detach the applier. root@SC1:~# immapplier -a @whatever Test Let another node join the cluster. Create a CCB which is active on an object of 'Test' class. Don't commit the CCB. root@SC1:~# immcfg > immcfg -c Test test=1 > Try to set applier again. root@SC1:/srv/shared# immapplier -a @whatever Test Implementer: @whatever ImmVersion: A 2 16 error - saImmOiImplementerSet FAILED: SA_AIS_ERR_TRY_AGAIN (6) SC-1 osafimmnd [419:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) from 0 osafimmnd [419:immnd_evt.c:9527] T2 originated here?:1 nodeId:2010f conn: 225 osafimmnd [419:ImmModel.cc:12967] >> implementerSet osafimmnd [419:ImmModel.cc:13008] T7 Re-using implementer for @whatever osafimmnd [419:ImmModel.cc:13040] TR TRY_AGAIN: ccb 2 is active on object 'test=1' bound to class applier '@whatever'. Can not re-attach applier osafimmnd [419:ImmModel.cc:13156] << implementerSet PL-3 osafimmnd [392:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) from 0 osafimmnd [392:immnd_evt.c:9527] T2 originated here?:0 nodeId:2010f conn: 225 osafimmnd [392:ImmModel.cc:12967] >> implementerSet osafimmnd [392:ImmModel.cc:13008] T7 Re-using implementer for @whatever osafimmnd [392:ImmModel.cc:13087] NO Implementer (applier) connected: 5 (@whatever) <0, 2010f> osafimmnd [392:ImmModel.cc:13156] << implementerSet IMMND on SC-1 rejected the implSet request but IMMND on PL-3 accepted it. The applier was not synced to PL-3 (mAppliers.empty() returned true) so the implSet request passed the ccb check. if( ! obj->mClassInfo->mAppliers.empty()) { ImplementerSet::iterator ii = obj->mClassInfo->mAppliers.begin(); for(; ii != obj->mClassInfo->mAppliers.end(); ++ii) { if((*ii) == info) { TRACE("TRY_AGAIN: ccb %u is active on object '%s' " "bound to class applier '%s'. Can not re-attach applier", ccb->mId, omit->first.c_str(), implName.c_str()); err = SA_AIS_ERR_TRY_AGAIN; goto done; } } } Now commit the CCB and try to set the applier again. SC-1 osafimmnd [419:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) from 0 osafimmnd [419:immnd_evt.c:9527] T2 originated here?:1 nodeId:2010f conn: 226 osafimmnd [419:ImmModel.cc:12967] >> implementerSet osafimmnd [419:ImmModel.cc:13008] T7 Re-using implementer for @whatever osafimmnd [419:ImmModel.cc:13087] NO Implementer (applier) connected: 6 (@whatever) <226, 2010f> osafimmnd [419:ImmModel.cc:13156] << implementerSet PL-3 osafimmnd [392:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) from 0 osafimmnd [392:immnd_evt.c:9527] T2 originated here?:0 nodeId:2010f conn: 226 osafimmnd [392:ImmModel.cc:12967] >> implementerSet osafimmnd [392:ImmModel.cc:13003] T7 ERR_EXIST: Registered implementer already exists: @whatever osafimmnd [392:ImmModel.cc:13005] << implementerSet The applier had diferent ids on SC-1 and PL-3. When a new node joins the cluster, IMMND on PL-3 will crash when verifying the implementers. PL3 osafimmnd[392]: ER Sync-verify: Established node has different Implementer-id: 5 for name: @whatever, sync says 6. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1504 imm: Appliers for classes and objects are not synced to sync-client
The applier-names are synced but the class/object-applier data is not sync-ed. That is intentional and I dont want a solution that tries to sync all applier information to all nodes. The class-applier and object-applier mechanism is inherrently local, i.e. only used at the node where the applier exists. Remeber that an applier is a listener and not a true particiapnt in CCbs, so its existence should only matter locally. The only thing thatg is global is the existence of an applier with a certain name and the current location if any for that exact applier with that name. Having said that, it is still important that the local class/object applier is not allowed to attach in such a way that it can see an incomplete ccb. Iam thinking about what the best approach foir a fix would be. Dont start doing some complex implementation of this yet. --- ** [tickets:#1504] imm: Appliers for classes and objects are not synced to sync-client** **Status:** assigned **Milestone:** 4.5.2 **Created:** Mon Sep 28, 2015 04:15 AM UTC by Hung Nguyen **Last Updated:** Mon Sep 28, 2015 07:19 AM UTC **Owner:** Hung Nguyen Set an applier to a class. Then exit immapplier to detach the applier. root@SC1:~# immapplier -a @whatever Test Let another node join the cluster. Create a CCB which is active on an object of 'Test' class. Don't commit the CCB. root@SC1:~# immcfg > immcfg -c Test test=1 > Try to set applier again. root@SC1:/srv/shared# immapplier -a @whatever Test Implementer: @whatever ImmVersion: A 2 16 error - saImmOiImplementerSet FAILED: SA_AIS_ERR_TRY_AGAIN (6) SC-1 osafimmnd [419:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) from 0 osafimmnd [419:immnd_evt.c:9527] T2 originated here?:1 nodeId:2010f conn: 225 osafimmnd [419:ImmModel.cc:12967] >> implementerSet osafimmnd [419:ImmModel.cc:13008] T7 Re-using implementer for @whatever osafimmnd [419:ImmModel.cc:13040] TR TRY_AGAIN: ccb 2 is active on object 'test=1' bound to class applier '@whatever'. Can not re-attach applier osafimmnd [419:ImmModel.cc:13156] << implementerSet PL-3 osafimmnd [392:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) from 0 osafimmnd [392:immnd_evt.c:9527] T2 originated here?:0 nodeId:2010f conn: 225 osafimmnd [392:ImmModel.cc:12967] >> implementerSet osafimmnd [392:ImmModel.cc:13008] T7 Re-using implementer for @whatever osafimmnd [392:ImmModel.cc:13087] NO Implementer (applier) connected: 5 (@whatever) <0, 2010f> osafimmnd [392:ImmModel.cc:13156] << implementerSet IMMND on SC-1 rejected the implSet request but IMMND on PL-3 accepted it. The applier was not synced to PL-3 (mAppliers.empty() returned true) so the implSet request passed the ccb check. if( ! obj->mClassInfo->mAppliers.empty()) { ImplementerSet::iterator ii = obj->mClassInfo->mAppliers.begin(); for(; ii != obj->mClassInfo->mAppliers.end(); ++ii) { if((*ii) == info) { TRACE("TRY_AGAIN: ccb %u is active on object '%s' " "bound to class applier '%s'. Can not re-attach applier", ccb->mId, omit->first.c_str(), implName.c_str()); err = SA_AIS_ERR_TRY_AGAIN; goto done; } } } Now commit the CCB and try to set the applier again. SC-1 osafimmnd [419:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) from 0 osafimmnd [419:immnd_evt.c:9527] T2 originated here?:1 nodeId:2010f conn: 226 osafimmnd [419:ImmModel.cc:12967] >> implementerSet osafimmnd [419:ImmModel.cc:13008] T7 Re-using implementer for @whatever osafimmnd [419:ImmModel.cc:13087] NO Implementer (applier) connected: 6 (@whatever) <226, 2010f> osafimmnd [419:ImmModel.cc:13156] << implementerSet PL-3 osafimmnd [392:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) from 0 osafimmnd [392:immnd_evt.c:9527] T2 originated here?:0 nodeId:2010f conn: 226 osafimmnd [392:ImmModel.cc:12967] >> implementerSet osafimmnd [392:ImmModel.cc:13003] T7 ERR_EXIST: Registered implementer already exists: @whatever osafimmnd [392:ImmModel.cc:13005] << implementerSet The applier had diferent ids on SC-1 and PL-3. When a new node joins the cluster, IMMND on PL-3 will crash when verifying the implementers. PL3 osafimmnd[392]: ER Sync-verify: Established node has different Implementer-id: 5 for name: @whatever, sync says 6. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing
[tickets] [opensaf:tickets] #1504 imm: Appliers for classes and objects are not synced to sync-client
The problem is the feature of *implicit* class-implementer-set and *implicit* object-implementer-set. Ironically this feature is parctically useless for appliers. One possible (and relatively simple) solution would be to only do the ccb interference checks for *appliers* at the node where the applier is actually attaching. That would almost be in fevs_local_checks, except that implementer-set is not a regular fevs message at the sending side. So instad it would be in immnd_evt_proc_impl_set in immnd_evt.c. If the check fails then the local IMMND simply rejects the request with TRY_AGAIN (or ERR_BUSY would in reality be better here since the immsv has no control over how long the wait will be). The current applier check at the fevs receiving side for implementer-set is simply removed. --- ** [tickets:#1504] imm: Appliers for classes and objects are not synced to sync-client** **Status:** assigned **Milestone:** 4.5.2 **Created:** Mon Sep 28, 2015 04:15 AM UTC by Hung Nguyen **Last Updated:** Mon Sep 28, 2015 08:44 AM UTC **Owner:** Hung Nguyen Set an applier to a class. Then exit immapplier to detach the applier. root@SC1:~# immapplier -a @whatever Test Let another node join the cluster. Create a CCB which is active on an object of 'Test' class. Don't commit the CCB. root@SC1:~# immcfg > immcfg -c Test test=1 > Try to set applier again. root@SC1:/srv/shared# immapplier -a @whatever Test Implementer: @whatever ImmVersion: A 2 16 error - saImmOiImplementerSet FAILED: SA_AIS_ERR_TRY_AGAIN (6) SC-1 osafimmnd [419:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) from 0 osafimmnd [419:immnd_evt.c:9527] T2 originated here?:1 nodeId:2010f conn: 225 osafimmnd [419:ImmModel.cc:12967] >> implementerSet osafimmnd [419:ImmModel.cc:13008] T7 Re-using implementer for @whatever osafimmnd [419:ImmModel.cc:13040] TR TRY_AGAIN: ccb 2 is active on object 'test=1' bound to class applier '@whatever'. Can not re-attach applier osafimmnd [419:ImmModel.cc:13156] << implementerSet PL-3 osafimmnd [392:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) from 0 osafimmnd [392:immnd_evt.c:9527] T2 originated here?:0 nodeId:2010f conn: 225 osafimmnd [392:ImmModel.cc:12967] >> implementerSet osafimmnd [392:ImmModel.cc:13008] T7 Re-using implementer for @whatever osafimmnd [392:ImmModel.cc:13087] NO Implementer (applier) connected: 5 (@whatever) <0, 2010f> osafimmnd [392:ImmModel.cc:13156] << implementerSet IMMND on SC-1 rejected the implSet request but IMMND on PL-3 accepted it. The applier was not synced to PL-3 (mAppliers.empty() returned true) so the implSet request passed the ccb check. if( ! obj->mClassInfo->mAppliers.empty()) { ImplementerSet::iterator ii = obj->mClassInfo->mAppliers.begin(); for(; ii != obj->mClassInfo->mAppliers.end(); ++ii) { if((*ii) == info) { TRACE("TRY_AGAIN: ccb %u is active on object '%s' " "bound to class applier '%s'. Can not re-attach applier", ccb->mId, omit->first.c_str(), implName.c_str()); err = SA_AIS_ERR_TRY_AGAIN; goto done; } } } Now commit the CCB and try to set the applier again. SC-1 osafimmnd [419:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) from 0 osafimmnd [419:immnd_evt.c:9527] T2 originated here?:1 nodeId:2010f conn: 226 osafimmnd [419:ImmModel.cc:12967] >> implementerSet osafimmnd [419:ImmModel.cc:13008] T7 Re-using implementer for @whatever osafimmnd [419:ImmModel.cc:13087] NO Implementer (applier) connected: 6 (@whatever) <226, 2010f> osafimmnd [419:ImmModel.cc:13156] << implementerSet PL-3 osafimmnd [392:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) from 0 osafimmnd [392:immnd_evt.c:9527] T2 originated here?:0 nodeId:2010f conn: 226 osafimmnd [392:ImmModel.cc:12967] >> implementerSet osafimmnd [392:ImmModel.cc:13003] T7 ERR_EXIST: Registered implementer already exists: @whatever osafimmnd [392:ImmModel.cc:13005] << implementerSet The applier had diferent ids on SC-1 and PL-3. When a new node joins the cluster, IMMND on PL-3 will crash when verifying the implementers. PL3 osafimmnd[392]: ER Sync-verify: Established node has different Implementer-id: 5 for name: @whatever, sync says 6. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.--
[tickets] [opensaf:tickets] #1313 osaf: opensaf does not start when long dn object is present in imm.db and cluster is reset
- **status**: unassigned --> duplicate - **Component**: osaf --> log - **Version**: 4.6 FC --> 4.6 - **Milestone**: 4.5.1 --> never - **Comment**: Duplicate of #1452 which is fixed. https://sourceforge.net/p/opensaf/tickets/1452/ --- ** [tickets:#1313] osaf: opensaf does not start when long dn object is present in imm.db and cluster is reset** **Status:** duplicate **Milestone:** never **Created:** Mon Apr 13, 2015 08:57 AM UTC by Sirisha Alla **Last Updated:** Fri Aug 14, 2015 12:39 PM UTC **Owner:** Mathi Naickan **Attachments:** - [slot1.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1313/attachment/slot1.tar.bz2) (269.6 kB; application/x-bzip) This is observed on changeset 6377 (46FC Tag). The system is up with single pbe and 50k objects. Long dns was enabled. There is one long dn object in the cluster. Syslog on SC-1: Apr 9 15:49:14 SLES-64BIT-SLOT1 osafimmnd[10731]: WA Setting attr longDnsAllowed to 0 in opensafImm=opensafImm,safApp=safImmService not allowed when long RDN exists inside object: xattrName_testAdminOwnerClear_SubLevelScope_1011 Now the cluster is reset. Nodes in the cluster fail to come up with the following reason: Apr 13 13:04:55 SLES-64BIT-SLOT1 osafimmnd[3439]: NO Persistent Back End OI attached, pid: 3465 Apr 13 13:04:55 SLES-64BIT-SLOT1 osafimmnd[3439]: NO Implementer connected: 1 (OpenSafImmPBE) <10, 2010f> Apr 13 13:04:55 SLES-64BIT-SLOT1 osafimmnd[3439]: NO implementer for class 'OpensafImm' is OpenSafImmPBE => class extent is safe. Apr 13 13:04:55 SLES-64BIT-SLOT1 osafimmpbed: NO Update epoch 20 committing with ccbId:10003/4294967299 Apr 13 13:04:56 SLES-64BIT-SLOT1 osafimmnd[3439]: NO PBE-OI established on this SC. Dumping incrementally to file imm.db Apr 13 13:05:34 SLES-64BIT-SLOT1 opensafd[3378]: ER Timed-out for response from LOGD Apr 13 13:05:34 SLES-64BIT-SLOT1 opensafd[3378]: ER Apr 13 13:05:34 SLES-64BIT-SLOT1 opensafd[3378]: ER Going for recovery Apr 13 13:05:34 SLES-64BIT-SLOT1 opensafd[3378]: ER Trying To RESPAWN /usr/lib64/opensaf/clc-cli/osaf-logd attempt #1 Apr 13 13:05:34 SLES-64BIT-SLOT1 opensafd[3378]: ER Sending SIGKILL to LOGD, pid=3452 Apr 13 13:05:49 SLES-64BIT-SLOT1 osaflogd[3500]: Started Apr 13 13:05:49 SLES-64BIT-SLOT1 osaflogd[3500]: WA read_logsv_configuration(). All attributes could not be read Apr 13 13:05:49 SLES-64BIT-SLOT1 osaflogd[3500]: NO Log config system: high 0 low 0, application: high 0 low 0 Apr 13 13:05:49 SLES-64BIT-SLOT1 osaflogd[3500]: NO log root directory is: /var/log/opensaf/saflog Apr 13 13:05:49 SLES-64BIT-SLOT1 osaflogd[3500]: NO LOG data group is: Apr 13 13:05:49 SLES-64BIT-SLOT1 osaflogd[3500]: NO LGS_MBCSV_VERSION = 4 Apr 13 13:05:49 SLES-64BIT-SLOT1 osaflogd[3500]: saImmOmSearchInitialize FAILED, rc = 13 Apr 13 13:06:29 SLES-64BIT-SLOT1 opensafd[3378]: ER Timed-out for response from LOGD Apr 13 13:06:29 SLES-64BIT-SLOT1 opensafd[3378]: ER Could Not RESPAWN LOGD Apr 13 13:06:29 SLES-64BIT-SLOT1 opensafd[3378]: ER Apr 13 13:06:29 SLES-64BIT-SLOT1 opensafd[3378]: ER Trying To RESPAWN /usr/lib64/opensaf/clc-cli/osaf-logd attempt #2 Apr 13 13:06:29 SLES-64BIT-SLOT1 opensafd[3378]: ER Sending SIGKILL to LOGD, pid=3495 Apr 13 13:06:44 SLES-64BIT-SLOT1 osaflogd[3546]: Started Apr 13 13:06:44 SLES-64BIT-SLOT1 osaflogd[3546]: WA read_logsv_configuration(). All attributes could not be read Apr 13 13:06:44 SLES-64BIT-SLOT1 osaflogd[3546]: NO Log config system: high 0 low 0, application: high 0 low 0 Apr 13 13:06:44 SLES-64BIT-SLOT1 osaflogd[3546]: NO log root directory is: /var/log/opensaf/saflog Apr 13 13:06:44 SLES-64BIT-SLOT1 osaflogd[3546]: NO LOG data group is: Apr 13 13:06:44 SLES-64BIT-SLOT1 osaflogd[3546]: NO LGS_MBCSV_VERSION = 4 Apr 13 13:06:44 SLES-64BIT-SLOT1 osaflogd[3546]: saImmOmSearchInitialize FAILED, rc = 13 Apr 13 13:07:24 SLES-64BIT-SLOT1 opensafd[3378]: ER Timed-out for response from LOGD Apr 13 13:07:24 SLES-64BIT-SLOT1 opensafd[3378]: ER Could Not RESPAWN LOGD Apr 13 13:07:24 SLES-64BIT-SLOT1 opensafd[3378]: ER Apr 13 13:07:24 SLES-64BIT-SLOT1 opensafd[3378]: ER FAILED TO RESPAWN Apr 13 13:07:24 SLES-64BIT-SLOT1 osaffmd[3419]: exiting for shutdown Apr 13 13:07:24 SLES-64BIT-SLOT1 osafimmd[3429]: exiting for shutdown Apr 13 13:07:24 SLES-64BIT-SLOT1 osafimmnd[3439]: NO No IMMD service => cluster restart, exiting Apr 13 13:07:24 SLES-64BIT-SLOT1 osafimmpbed: WA PBE lost contact with parent IMMND - Exiting Apr 13 13:07:24 SLES-64BIT-SLOT1 osafrded[3410]: exiting for shutdown Apr 13 13:07:24 SLES-64BIT-SLOT1 kernel: [ 1630.782513] TIPC: Disabling bearer Apr 13 13:07:24 SLES-64BIT-SLOT1 kernel: [ 1630.782518] TIPC: Lost link <1.1.1:eth0-1.1.4:eth0> on network plane A Apr 13 13:07:24 SLES-64BIT-SLOT1 kernel: [ 1630.782521] TIPC: Lost contact with <1.1.4> Apr 13
[tickets] [opensaf:tickets] #1494 imm: missmatch in obj_create_rsp event type
- **Version**: --> 4.7 --- ** [tickets:#1494] imm: missmatch in obj_create_rsp event type** **Status:** review **Milestone:** 4.7.FC **Created:** Tue Sep 22, 2015 05:52 AM UTC by Neelakanta Reddy **Last Updated:** Tue Sep 22, 2015 06:06 AM UTC **Owner:** Neelakanta Reddy immnd_evt.c:3756: immnd_evt_proc_ccb_obj_create_rsp: Assertion 'evt->type == IMMND_EVT_A2ND_CCB_OBJ_MODIFY_RSP_2' failed. in the create_rsp modify_rsp_2 is used and it should be IMMND_EVT_A2ND_CCB_OBJ_CREATE_RSP. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1291 IMM: IMMD healthcheck callback timeout when standby controller rebooted in middle of IMMND sync
- **status**: unassigned --> invalid - **Comment**: If the problem is to be declared as a configuration error, i.e. solvable by adjusting one or more configuration values that are documented by OpenSAF, then the ticket should be closed as invalid. --- ** [tickets:#1291] IMM: IMMD healthcheck callback timeout when standby controller rebooted in middle of IMMND sync** **Status:** invalid **Milestone:** 4.5.2 **Created:** Mon Mar 30, 2015 07:21 AM UTC by Sirisha Alla **Last Updated:** Mon Sep 21, 2015 04:47 AM UTC **Owner:** nobody **Attachments:** - [immlogs.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1291/attachment/immlogs.tar.bz2) (6.8 MB; application/x-bzip) The issue is observed with 4.6 FC changeset 6377. The system is up and running with single pbe and 50k objects. This issue is seen after http://sourceforge.net/p/opensaf/tickets/1290 is observed. IMM application is running on standby controller and immcfg command is run from payload to set CompRestartMax value to 1000. IMMND is killed twice on standby controller leading to #1290. As a result, standby controller left the cluster in middle of sync, IMMD reported healthcheck callback timeout and the active controller too went for reboot. Following is the syslog of SC-1: Mar 26 14:58:17 SLES-64BIT-SLOT1 osafimmloadd: NO Sync starting Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: NO Node Down event for node id 2020f: Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: NO Current role: ACTIVE Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: Received Node Down for peer controller, OwnNodeId = 131343, SupervisionTime = 60 Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.412080] TIPC: Resetting link <1.1.1:eth0-1.1.2:eth0>, peer not responding Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.412089] TIPC: Lost link <1.1.1:eth0-1.1.2:eth0> on network plane A Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.413191] TIPC: Lost contact with <1.1.2> Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:30 SLES-64BIT-SLOT1 osafamfd[9628]: NO Node 'SC-2' left the cluster Mar 26 14:58:30 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting remote node in the absence of PLM is outside the scope of OpenSAF Mar 26 14:58:54 SLES-64BIT-SLOT1 kernel: [15226.674333] TIPC: Established link <1.1.1:eth0-1.1.2:eth0> on network plane A Mar 26 15:00:02 SLES-64BIT-SLOT1 syslog-ng[3261]: Log statistics; dropped='pipe(/dev/xconsole)=0', dropped='pipe(/dev/tty10)=0', processed='center(queued)=2197', processed='center(received)=1172', processed='destination(messages)=1172', processed='destination(mailinfo)=0', processed='destination(mailwarn)=0', processed='destination(localmessages)=955', processed='destination(newserr)=0', processed='destination(mailerr)=0', processed='destination(netmgm)=0', processed='destination(warn)=44', processed='destination(console)=13', processed='destination(null)=0', processed='destination(mail)=0', processed='destination(xconsole)=13', processed='destination(firewall)=0', processed='destination(acpid)=0', processed='destination(newscrit)=0', processed='destination(newsnotice)=0', processed='source(src)=1172' Mar 26 15:00:07 SLES-64BIT-SLOT1 osafimmloadd: ER Too many TRY_AGAIN on saImmOmSearchNext - aborting Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: ER SYNC APPARENTLY FAILED status:1 Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO -SERVER STATE: IMM_SERVER_SYNC_SERVER --> IMM_SERVER_READY Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE (2484) Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO Epoch set to 12 in ImmModel Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO Coord broadcasting ABORT_SYNC, epoch:12 Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmpbed: NO Update epoch 12 committing with ccbId:10054/4294967380 Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO SU failover probation timer started (timeout: 12000 ns) Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO Performing failover of 'safSu=SC-1,safSg=2N,safApp=OpenSAF' (SU failover count: 1) Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO 'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' recovery action escalated
[tickets] [opensaf:tickets] #1269 IMM: Library side behavior at failure to allocatge memory needs to be consistent
- **Milestone**: 4.7.FC --> future --- ** [tickets:#1269] IMM: Library side behavior at failure to allocatge memory needs to be consistent** **Status:** assigned **Milestone:** future **Created:** Tue Mar 17, 2015 10:24 AM UTC by Anders Bjornerstedt **Last Updated:** Tue Aug 25, 2015 04:08 PM UTC **Owner:** Hung Nguyen The IMM library/agent side (IMMA) should behave consistently and follow a consistent coding pattern for dealing with the case of failure to allocate memory. The IMMA library is linked with an application process that is using the IMM service. Failure to allocate memory is rare and means that the the processor where the application is executing is overloaded. Because the IMMA library is hosted by a an application, there is some merrit in returning control to the application letting it decide how to escalate. This is "nice" towards the application, making troubleshooting simpler for those responsible for the application. In terms coding, the simplest solution possible should be used. The allowed solutions in coding on the IMM library/agent side should be: a) Return SA_AIS_ERR_NO_MEMORY b) osafassert the pointer after malloc/calloc/strdup c) Nothing, i.e. segv at the next dereference. where (a) is recommended when the allocation error occurs close to the API; (b) is recommended in deeper levels of function invocation; (c) is allowed in legacy library code, but should be avoided in new/updated code. We need to allow (c) in the agent/library, otherwise this ticket would be a defect ticket. Writing explict if statements checking for null and writing explicit customized syslog error messages, or trace messages is not allowed in the library for the memory allocation failure case. Osafassert does write to the syslog but that is allowed exception here. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1474 imm: Assigning default value to no-dangling attributes make cluster fail to start
- **status**: unassigned --> assigned - **assigned_to**: Hung Nguyen --- ** [tickets:#1474] imm: Assigning default value to no-dangling attributes make cluster fail to start** **Status:** assigned **Milestone:** 4.5.2 **Created:** Thu Sep 10, 2015 02:27 PM UTC by Hung Nguyen **Last Updated:** Mon Sep 14, 2015 09:39 AM UTC **Owner:** Hung Nguyen root@SC1:~# immlist -c Test << Test - CONFIG >> test : SA_STRING_T [1] {RDN, CONFIG, INITIALIZED} dep : SA_NAME_T [0] = test=1 (6) {CONFIG, WRITEABLE, NO_DANGLING} Create test=1 and test=2 root@SC1:~# immcfg -c Test test=1 root@SC1:~# immcfg -c Test test=2 Set the attribute with default value to empty. root@SC1:~# immcfg -a dep= test=2 root@SC-1:~# immlist -a dep test=2 dep= Now test=1 can be deleted root@SC1:~# immcfg -d test=1 Reboot cluster and it will fail to start Sep 10 21:03:36 SC1 osafimmloadd: NO * Loading from PBE file imm.db at /srv/shared/imm/ * Sep 10 21:03:40 SC1 osafimmnd[421]: NO ERR_FAILED_OPERATION: NO_DANGLING reference (test=1) is dangling (Ccb 1) Sep 10 21:03:40 SC1 osafimmnd[421]: NO Ccb 1 ABORTED (IMMLOADER) [#1377] --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1474 imm: Assigning default value to no-dangling attributes make cluster fail to start
- **Priority**: major --> critical - **Comment**: Raising severity to critical since the symptom caused by this defect is severe. I propose that the solution is to not allow default values to be defined for attributes flagged with NO_DANGLING in the class definition. --- ** [tickets:#1474] imm: Assigning default value to no-dangling attributes make cluster fail to start** **Status:** unassigned **Milestone:** 4.5.2 **Created:** Thu Sep 10, 2015 02:27 PM UTC by Hung Nguyen **Last Updated:** Thu Sep 10, 2015 02:27 PM UTC **Owner:** nobody root@SC1:~# immlist -c Test << Test - CONFIG >> test : SA_STRING_T [1] {RDN, CONFIG, INITIALIZED} dep : SA_NAME_T [0] = test=1 (6) {CONFIG, WRITEABLE, NO_DANGLING} Create test=1 and test=2 root@SC1:~# immcfg -c Test test=1 root@SC1:~# immcfg -c Test test=2 Set the attribute with default value to empty. root@SC1:~# immcfg -a dep= test=2 root@SC-1:~# immlist -a dep test=2 dep= Now test=1 can be deleted root@SC1:~# immcfg -d test=1 Reboot cluster and it will fail to start Sep 10 21:03:36 SC1 osafimmloadd: NO * Loading from PBE file imm.db at /srv/shared/imm/ * Sep 10 21:03:40 SC1 osafimmnd[421]: NO ERR_FAILED_OPERATION: NO_DANGLING reference (test=1) is dangling (Ccb 1) Sep 10 21:03:40 SC1 osafimmnd[421]: NO Ccb 1 ABORTED (IMMLOADER) [#1377] --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1472 imm: Default values are assigned to empty-valued attributes when sync
- **Priority**: major --> critical - **Comment**: Raising severity to critical for this ticket since the symptom is an inconsistency in the imm database between nodes. --- ** [tickets:#1472] imm: Default values are assigned to empty-valued attributes when sync** **Status:** review **Milestone:** 4.5.2 **Created:** Wed Sep 09, 2015 11:01 AM UTC by Hung Nguyen **Last Updated:** Thu Sep 10, 2015 02:30 PM UTC **Owner:** Hung Nguyen root@SC-1:~# immlist -c Test << Test - CONFIG >> test : SA_STRING_T [1] {RDN, CONFIG, INITIALIZED} attr : SA_INT64_T [0] = 100 (0x64) {CONFIG, WRITEABLE} On SC-1, set attribute of an object that has default to empty. root@SC-1:~# immcfg -c Test test=1 root@SC-1:~# immcfg -a attr= test=1 root@SC-1:~# immlist -a attr test=1 attr= Let another node join the cluster. On that node, list value of the attribute root@PL-3:~# immlist -a attr test=1 attr=100 There's a mismatch between the nodes. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] Re: #1448 smf: Make campaigns less fragile by retrying on ERR_NO_RESOURCES
The end user of recent releases i.e. previous releases has not seen this problem. At least no report of the problem has been created until a few weeks ago. It only occurs in overload situations and has only been seen in recent testing with an overloaded system. New ways of testing are always good. But testing overload on a system with no load regulation will always find the next bottleneck symptom. WE can play that game indefinitely. Adding defect upon defect. Or we can provide some form of load-regulation mechanism for OpenSAF. It is also ironic that we need to fix this particular overload issue on old releases at the same time as we are Ripping up existing time release plans and suddenly declaring we are going to one-track development., Personally I am increasingly frustrated by the deterioration in following the rules of the ticket system. Why not just drop the distinction between enhancement and defect ? No one seems to care (or bother ) about this distinction any more. The main reason for the distinction (I thought) was to provide an increased degree of stability on older Branches. New features always means new risk, at least in the short term i.e. first release occurrence of a new feature (enhancement). But no one seems to care about that. /AndersBj From: Mathi Naickan [mailto:mathi-naic...@users.sf.net] Sent: den 25 augusti 2015 16:37 To: opensaf-tickets@lists.sourceforge.net Subject: [tickets] [opensaf:tickets] #1448 smf: Make campaigns less fragile by retrying on ERR_NO_RESOURCES I think it is more unfair to the end user of recent releases by not passing on the benefit by providing an optimization or fix for an issue just because it was uncovered/hit late! And especially when the fix does not create any harm and only helps in succeeding the campaign. May be in the case of this ticket, there is more to help the user and nothing to harm the code path! Also, the facts that this is not a newly introduced error code and that IMM API users have not met the expectation set upon by IMM, to handle this as TRY_AGAIN calls for this to be a defect. [tickets:#1448]http://sourceforge.net/p/opensaf/tickets/1448/ smf: Make campaigns less fragile by retrying on ERR_NO_RESOURCES Status: unassigned Milestone: future Created: Fri Aug 14, 2015 07:09 AM UTC by Anders Bjornerstedt Last Updated: Tue Aug 25, 2015 11:14 AM UTC Owner: nobody The SMF service is a heavy user of the IMM service. The IMM has an established client pattern for ERR_TRY_AGAIN which allows an application realtime control over how long it is prepared to wait for a transient inability of the IMM service to fullfill a request. Each response of TRY_AGAIN should in itself be fast so the application needs a delay in its retry loop. There is also the very similar error code ERR_NO_RESOURSES. Logically that error code is identical to TRY_AGAIN in that the request could not be accepted due to no fault of the client but due to some more or less temporary problem in the IMM service. The difference is that NO_RESOURCES has no realtime ambitions. Typically this error code is used by the imm when the imm can not fullfill a request due to reasons that are outside of the imm service control. Also the time from request to a response of ERR_NO_RESOUIRCES may be long. The SMF service in general has no realtime requirments. The main goal for the SMF service is to successfully complete correctly formulated camopaings. This means that the SMF service should be programmed to avoid unnecessary fragility related to temporary problems, even if the temporary problem could linger for seconds or minutes. The alternative of aborting the campaign will itself discard potentially large execution times already completed. It may sometimes even result in a system restore. This means that SMF campaigns should have a retry loop that handles not just TRY_AGAIN, but also ERR_NO_RESOURCES where this return code is relevant (can be returned according to the API spec).. The error copde ERR_BUSY also exists and is for all practical purposes identical to ERR_NO_RESOURCES in semantics, both logical and timing. Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.netmailto:opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list. --- ** [tickets:#1448] smf: Make campaigns less fragile by retrying on ERR_NO_RESOURCES** **Status:** unassigned **Milestone:** future **Created:** Fri Aug 14, 2015 07:09 AM UTC by Anders Bjornerstedt **Last Updated:** Tue Aug 25, 2015 03:03 PM UTC **Owner:** nobody The SMF service is a heavy user of the IMM service. The IMM has an established client pattern for ERR_TRY_AGAIN
[tickets] [opensaf:tickets] Re: #1448 smf: Make campaigns less fragile by retrying on ERR_NO_RESOURCES
Yes the principle about handling ERR_NO_RESOURCES should be the same everywhere over all SAF services. Just as the rules for handling TRY_AGAIN should be the same over all OpenSAF services. Any client-application is free to decide to not handle these errors, i.e. to stop trying if they get them. But applications can be made more robust by handling these errors. There is also ERR_BUSY which for the immsv works exactly the same way as ERR_NO_RESOURCES. SAF created too many error codes as I see it. There should only be one error code for any particular handling behavior defined as appropriate for the error. If two error codes are to be handled exactly the same then one of the error codes should be deprecated. /AndersBJ From: Mathi Naickan [mailto:mathi-naic...@users.sf.net] Sent: den 25 augusti 2015 17:03 To: [opensaf:tickets] Subject: [opensaf:tickets] #1448 smf: Make campaigns less fragile by retrying on ERR_NO_RESOURCES Just as a note - previously, I had a discussion with Ingvar and he had agreed to convert this into a defect. I can provide a patch for this for OM api calls except for the CCB APIs (based on the description above). Should we also give this treatment for OI APIs? [tickets:#1448]http://sourceforge.net/p/opensaf/tickets/1448/ smf: Make campaigns less fragile by retrying on ERR_NO_RESOURCES Status: unassigned Milestone: future Created: Fri Aug 14, 2015 07:09 AM UTC by Anders Bjornerstedt Last Updated: Tue Aug 25, 2015 02:37 PM UTC Owner: nobody The SMF service is a heavy user of the IMM service. The IMM has an established client pattern for ERR_TRY_AGAIN which allows an application realtime control over how long it is prepared to wait for a transient inability of the IMM service to fullfill a request. Each response of TRY_AGAIN should in itself be fast so the application needs a delay in its retry loop. There is also the very similar error code ERR_NO_RESOURSES. Logically that error code is identical to TRY_AGAIN in that the request could not be accepted due to no fault of the client but due to some more or less temporary problem in the IMM service. The difference is that NO_RESOURCES has no realtime ambitions. Typically this error code is used by the imm when the imm can not fullfill a request due to reasons that are outside of the imm service control. Also the time from request to a response of ERR_NO_RESOUIRCES may be long. The SMF service in general has no realtime requirments. The main goal for the SMF service is to successfully complete correctly formulated camopaings. This means that the SMF service should be programmed to avoid unnecessary fragility related to temporary problems, even if the temporary problem could linger for seconds or minutes. The alternative of aborting the campaign will itself discard potentially large execution times already completed. It may sometimes even result in a system restore. This means that SMF campaigns should have a retry loop that handles not just TRY_AGAIN, but also ERR_NO_RESOURCES where this return code is relevant (can be returned according to the API spec).. The error copde ERR_BUSY also exists and is for all practical purposes identical to ERR_NO_RESOURCES in semantics, both logical and timing. Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/1448/ To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/ --- ** [tickets:#1448] smf: Make campaigns less fragile by retrying on ERR_NO_RESOURCES** **Status:** unassigned **Milestone:** future **Created:** Fri Aug 14, 2015 07:09 AM UTC by Anders Bjornerstedt **Last Updated:** Tue Aug 25, 2015 03:03 PM UTC **Owner:** nobody The SMF service is a heavy user of the IMM service. The IMM has an established client pattern for ERR_TRY_AGAIN which allows an application realtime control over how long it is prepared to wait for a transient inability of the IMM service to fullfill a request. Each response of TRY_AGAIN should in itself be fast so the application needs a delay in its retry loop. There is also the very similar error code ERR_NO_RESOURSES. Logically that error code is identical to TRY_AGAIN in that the request could not be accepted due to no fault of the client but due to some more or less temporary problem in the IMM service. The difference is that NO_RESOURCES has no realtime ambitions. Typically this error code is used by the imm when the imm can not fullfill a request due to reasons that are outside of the imm service control. Also the time from request to a response of ERR_NO_RESOUIRCES may be long. The SMF service in general has no realtime requirments. The main goal for the SMF service is to successfully complete correctly formulated camopaings. This means that the SMF service should be programmed to avoid unnecessary fragility related to temporary problems, even
[tickets] [opensaf:tickets] #1458 AMF: Not possible to add/remove configuration for one node in one single CCB
- **summary**: Not possible to add/remove configuration for one node in one single CCB -- AMF: Not possible to add/remove configuration for one node in one single CCB --- ** [tickets:#1458] AMF: Not possible to add/remove configuration for one node in one single CCB** **Status:** accepted **Milestone:** 4.6.1 **Created:** Tue Aug 25, 2015 05:11 AM UTC by Gary Lee **Last Updated:** Tue Aug 25, 2015 05:24 AM UTC **Owner:** Gary Lee When adding a node to scale-out, it is not possible to do that in one CCB. safAmfNode cannot be created together with the rest of the configuration. When removing a node, CCBs are needed since the safAmfNode cannot be deleted and removed from the safAmfNodeGroups in the same CCB. Delete the safAmfNode from safAmfNodeGroups in the same CCB as the rest of the configuration that needs to be removed/updated is not possible either. SCALE_OUT: two ccb is ok, but problems to create only one ccb: error - saImmOmCcbApply FAILED: SA_AIS_ERR_FAILED_OPERATION (21) OI reports: SG or node not configured properly to allow creation of UNLOCKED SU 2171 16:32:40 06/10/2015 WA safApp=safAmfService Create 'safSu=dae1f5de53,safSg=NWayActive,safApp=ABC', configured with a non existing node (safAmfNode= PL-6,safAmfCluster=myAmfCluster) 2172 16:32:40 06/10/2015 NO safApp=safAmfService CCB 575 creation of 'safSu=dae1f5de53,safSg=NWayActive,safApp= ABC' failed 2173 16:32:40 06/10/2015 NO safApp=safAmfService CCB 575 validation error: SG or node not configured properly to allow creation of UNLOCKED SU 2174 16:32:42 06/10/2015 WA safApp=safAmfService Create 'safSu=dae1f5de53,safSg=NWayActive,safApp=ABC', configured with a non existing node (safAmfNode= PL-6,safAmfCluster=myAmfCluster) NOTE: 'safAmfNode=PL-6,safAmfCluster=myAmfCluster' is created first in the CCB! ccb.add: immcfg -c SaAmfNode safAmfNode=PL-6,safAmfCluster=myAmfCluster -a saAmfNodeSuFailoverMax=2 -a saAmfNodeSuFailOverProb= 12000 -a saAmfNodeFailfastOnTerminationFailure=1 -a saAmfNodeFailfastOnInstantiationFailure=0 -a saAmfNodeClmNode=safNode=PL-6,safCluster=myClmCluster -a saAmfNodeAutoRepair=1 -a saAmfNodeAdminState=3 SCALE_IN: Only 3 ccb works. Two problems below: 1. Scale_in script cannot update node groups event though dependent su is deleted in the same CCB: 1627 15:04:06 06/10/2015 NO safApp=safAmfService CCB 477 validation error: Cannot delete 'safAmfNode=PL-6, safAmfCluster=myAmfCluster' from 'safAmfNodeGroup=AllN odes,safAmfCluster=myAmfCluster'. An SU is mapped using node group 2. Remove from nodegroup and delete amfnode in the same ccb does not work. error - saImmOmCcbApply FAILED: SA_AIS_ERR_FAILED_OPERATION (21) OI reports: 'safAmfNode=PL-6,safAmfCluster=myAmfCluster' exists in the nodegroup 'safAmfNodeGroup=AllNodes,safAmfCluster =myAmfCluster' 1689 15:18:49 06/10/2015 NO safApp=safAmfService CCB 488 validation error: 'safAmfNode=PL-6,safAmfCluster=myAmfCluster' exists in the nodegroup 'safAmfNodeGroup =AllNodes,safAmfCluster=myAmfCluster' --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1448 smf: Make campaigns less fragile by retrying on ERR_NO_RESOURCES
There are at least three points to make in response to the claim that this has to be a defect. 1) We have not seen this problem earlier. So obviously testing is different this time, i.e. this is new way of testing that was not performed when testing the earlier releases. 2) Saying that this fix is the only way for this campaign to succeed is not true unless you show that the problem is not performance related. I am convinced that the root cause is very much performance related. So the very same campaign most likely succeeds, probably has succeeded in earlier releases, simply because the platform it was tested on had a more reasonable load/capacity ratio. 3) I have noticed that there is lately a tendency to stress test OpenSAF more often with higher load/capacity ratio, at least here at Ericsson due to various reasons. Probably it is relaed to the more volatile capacity of virtualized and/or cloud based platforms, in particular when they are being reconfigured. What I am basically saying is that it is always possible to increase the load/capcity ratio until you do see a resource related problem ocurr in the system. It is a bit unfair to then declare that problem as a defect. Particuarly when the effect is benign. In this case an SMF campaign gets aborted but in a controlled way. OpenSAF has no load regulation so OpenSAF is currently vulnerable to getting stuck in resource prroblems. OpensAF does have partial overload protection in the IMM service and this is what is geting triggered here (max outstanding fevs messages at the local IMMND a type of flow control). On the other hand if this is really a pratical and real problem also for deployments on old OpenSAF releases being used in new ways in *production* , i.e. there is a plan to regularly run with overloaded capacity in production, then one could declare this as a defect, even if it is a bit unfair. --- ** [tickets:#1448] smf: Make campaigns less fragile by retrying on ERR_NO_RESOURCES** **Status:** unassigned **Milestone:** future **Created:** Fri Aug 14, 2015 07:09 AM UTC by Anders Bjornerstedt **Last Updated:** Tue Aug 25, 2015 09:28 AM UTC **Owner:** nobody The SMF service is a heavy user of the IMM service. The IMM has an established client pattern for ERR_TRY_AGAIN which allows an application realtime control over how long it is prepared to wait for a transient inability of the IMM service to fullfill a request. Each response of TRY_AGAIN should in itself be fast so the application needs a delay in its retry loop. There is also the very similar error code ERR_NO_RESOURSES. Logically that error code is identical to TRY_AGAIN in that the request could not be accepted due to no fault of the client but due to some more or less temporary problem in the IMM service. The difference is that NO_RESOURCES has no realtime ambitions. Typically this error code is used by the imm when the imm can not fullfill a request due to reasons that are outside of the imm service control. Also the time from request to a response of ERR_NO_RESOUIRCES may be long. The SMF service in general has no realtime requirments. The main goal for the SMF service is to successfully complete correctly formulated camopaings. This means that the SMF service should be programmed to avoid unnecessary fragility related to temporary problems, even if the temporary problem could linger for seconds or minutes. The alternative of aborting the campaign will itself discard potentially large execution times already completed. It may sometimes even result in a system restore. This means that SMF campaigns should have a retry loop that handles not just TRY_AGAIN, but also ERR_NO_RESOURCES where this return code is relevant (can be returned according to the API spec).. The error copde ERR_BUSY also exists and is for all practical purposes identical to ERR_NO_RESOURCES in semantics, both logical and timing. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1448 smf: Make campaigns less fragile by retrying on ERR_NO_RESOURCES
I should also clarify that there is a distinction between (a) getting ER_NO_RESOURCES as the direct result from an IMM API call (in the above case a search or accessorGet used by SMF); and (b) determining that a CCB was aborted due to resource error and not validation error (new API enhancement #1449). In both cases it means that the thing/request was rejected/aborted for resource reasons. But the handling of retry is different. If the user (SMF) directly gets ERR_NO_RESOURCES returned on a call then that specifi call can be retried. But if the user (SMF) determines that a CCB has been aborted (ERR_FAILED_OPERATION) due to a resource failure (return value false on argument 'isValidationAbort' for the new API saImmOmCcbGetAbortReason, then a replay of the whole CCB can be atempted. But it makes no sense here to retry the last ccb related downcall (ccbApply or ccbVAlidate or ccbObjectCreate..) since the CCB has been aborted. This distinction should be simple because in the resource aborted CCB case you dont actually get SA_AIS_ERR_NO_RESOURCES as a return code. SMF campaigns robustness can be improved on both aspects, when #1449 has been delivered. --- ** [tickets:#1448] smf: Make campaigns less fragile by retrying on ERR_NO_RESOURCES** **Status:** unassigned **Milestone:** future **Created:** Fri Aug 14, 2015 07:09 AM UTC by Anders Bjornerstedt **Last Updated:** Tue Aug 25, 2015 10:56 AM UTC **Owner:** nobody The SMF service is a heavy user of the IMM service. The IMM has an established client pattern for ERR_TRY_AGAIN which allows an application realtime control over how long it is prepared to wait for a transient inability of the IMM service to fullfill a request. Each response of TRY_AGAIN should in itself be fast so the application needs a delay in its retry loop. There is also the very similar error code ERR_NO_RESOURSES. Logically that error code is identical to TRY_AGAIN in that the request could not be accepted due to no fault of the client but due to some more or less temporary problem in the IMM service. The difference is that NO_RESOURCES has no realtime ambitions. Typically this error code is used by the imm when the imm can not fullfill a request due to reasons that are outside of the imm service control. Also the time from request to a response of ERR_NO_RESOUIRCES may be long. The SMF service in general has no realtime requirments. The main goal for the SMF service is to successfully complete correctly formulated camopaings. This means that the SMF service should be programmed to avoid unnecessary fragility related to temporary problems, even if the temporary problem could linger for seconds or minutes. The alternative of aborting the campaign will itself discard potentially large execution times already completed. It may sometimes even result in a system restore. This means that SMF campaigns should have a retry loop that handles not just TRY_AGAIN, but also ERR_NO_RESOURCES where this return code is relevant (can be returned according to the API spec).. The error copde ERR_BUSY also exists and is for all practical purposes identical to ERR_NO_RESOURCES in semantics, both logical and timing. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1456 imm: IMM object has different attribute value after reloading it from PBE
- **status**: unassigned -- invalid - **Milestone**: 4.5.2 -- 4.7-Tentative - **Comment**: This behavior is well documented in the immsv README (and should also be so in the IMMSV_PR) since OpenSAF4.3. It has worked this way since OpenSAF was created. My opinion is that the real problem is the strange definition of default in SAF IMM as being tied to object-create. A better definition of attribute default value would be what I call strong default. This means that if you have defined a default then it is impossible to assign the empty value (NULL) to that atribute. An attempt to do so will result in the imm replacing it with the default value. That is a clean and mor normal definition of a default. See ticket #1425. In addition, no one has complained about this that I am aware of and some could potentially even depend on this behavior. An application that wants to assign the NULL/empty value to an attribute should not use the current default value mechanism. From the README: Common missunderstandings about attribute defaults. --- Imm class definitions allow the declaration of a default value to be defined as part of an attribute definition. (i) A default declaration is only allowed for single valued attributes (no concept of a multivalued default exists). (ii) Default values are assigned at object creation. Default values are NOT assigned if an attribute is set to the empty/null value by a modification. (iii) Default values are assigned at cluster restart for any attributes that are null/empty and that have a default. This is a special case of (i) because imm loading actually uses the regular imm API to recreate the imm contents. In particular, saImmOmCcbObjectCreate is used to recreate all objects from the file-system image. --- ** [tickets:#1456] imm: IMM object has different attribute value after reloading it from PBE** **Status:** invalid **Milestone:** 4.7-Tentative **Created:** Fri Aug 21, 2015 01:05 PM UTC by Anders Widell **Last Updated:** Fri Aug 21, 2015 01:05 PM UTC **Owner:** nobody There is a scenario when an IMM object can be different after reloading it from PBE, compared to what it looked like when it was saved to PBE. This happens when an attribute has a default value in the class definition, but the attribute value has been deleted (been set to NULL) in the object. When reloading the object from PBE, the attribute will again be set to the default value. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1445 imm: Don't check for pending fevs when only updating pure runtime attributes
- **summary**: imm: Don't check for pending fevs when updating pure runtime attributes -- imm: Don't check for pending fevs when only updating pure runtime attributes - **Comment**: The optimization is only for the pure and local case. This should typically be the case for a pure RTA update, which should only be the result of a read request from some client. However, the saImmOiRtObjectUpdate API actually leaves it open to the OI to allow it to update both pure RTAs and cached RTAs in the same call. Probably no one has tested that mixed variant since it has no clear use-case. --- ** [tickets:#1445] imm: Don't check for pending fevs when only updating pure runtime attributes** **Status:** review **Milestone:** 4.7-Tentative **Created:** Thu Aug 13, 2015 09:48 AM UTC by Hung Nguyen **Last Updated:** Fri Aug 14, 2015 06:21 AM UTC **Owner:** Neelakanta Reddy When invoking saImmOiRtObjectUpdate(), number of pending fevs messages is always checked on server side and TRY_AGAIN is returned when it reaches IMMSV_DEFAULT_FEVS_MAX_PENDING. If the attributes to be updated are pure runtime attributes, number of pending fevs messages should not be checked because the IMMD_EVT_ND2D_OI_OBJ_MODIFY message wouldn't be sent out to broadcast to other IMMNDs. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1449 IMM: CCB interface for probing abort reason (validation error or resource error)
- **status**: unassigned -- assigned - **assigned_to**: Zoran Milinkovic --- ** [tickets:#1449] IMM: CCB interface for probing abort reason (validation error or resource error)** **Status:** assigned **Milestone:** 4.7-Tentative **Created:** Fri Aug 14, 2015 08:33 AM UTC by Anders Bjornerstedt **Last Updated:** Fri Aug 14, 2015 08:34 AM UTC **Owner:** Zoran Milinkovic Suggested interface, closely related to saImmOmCcbGetErrorStrings(): extern SaAisErrorT saImmOmCcbGetAbortReason(SaImmCcbHandleT ccbHandle, SaBoolT* isValidationAbort); Arguments : ccbHandle (in)-The ccb handle. isValidationAbort (out) - SA_TRUE if validation abort otherwise resource abort. Return Values : SA_AIS_OK SA_AIS_ERR_BAD_HANDLE - bad ccb handle. SA_AIS_ERR_INVALID_PARAM - handle is associated with ccb that is NOT aborted. SA_AIS_ERR_VERSION (not using A.2.xx) --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1291 IMM: IMMD healthcheck callback timeout when standby controller rebooted in middle of IMMND sync
- **Component**: imm -- mds - **Comment**: The IMMD is blocked on the (asyncronous) broadcast of *one* fevs message for more than 3 minutes. Changing component to MDS. A reelvant question is what is otherwise special about this test. Is MDS TCP used and not TIPC (MDS broadcast uses TIPC multicast whic his faster). Clearly something is over/under dimensioned in this system. This test condifuration probably needs special configuration for MDS and or IMM (max sync bnatch size). Again I dont see that this is a defect on IMM. --- ** [tickets:#1291] IMM: IMMD healthcheck callback timeout when standby controller rebooted in middle of IMMND sync** **Status:** unassigned **Milestone:** 4.5.2 **Created:** Mon Mar 30, 2015 07:21 AM UTC by Sirisha Alla **Last Updated:** Wed Aug 19, 2015 09:27 AM UTC **Owner:** Neelakanta Reddy **Attachments:** - [immlogs.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1291/attachment/immlogs.tar.bz2) (6.8 MB; application/x-bzip) The issue is observed with 4.6 FC changeset 6377. The system is up and running with single pbe and 50k objects. This issue is seen after http://sourceforge.net/p/opensaf/tickets/1290 is observed. IMM application is running on standby controller and immcfg command is run from payload to set CompRestartMax value to 1000. IMMND is killed twice on standby controller leading to #1290. As a result, standby controller left the cluster in middle of sync, IMMD reported healthcheck callback timeout and the active controller too went for reboot. Following is the syslog of SC-1: Mar 26 14:58:17 SLES-64BIT-SLOT1 osafimmloadd: NO Sync starting Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: NO Node Down event for node id 2020f: Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: NO Current role: ACTIVE Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: Received Node Down for peer controller, OwnNodeId = 131343, SupervisionTime = 60 Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.412080] TIPC: Resetting link 1.1.1:eth0-1.1.2:eth0, peer not responding Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.412089] TIPC: Lost link 1.1.1:eth0-1.1.2:eth0 on network plane A Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.413191] TIPC: Lost contact with 1.1.2 Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:30 SLES-64BIT-SLOT1 osafamfd[9628]: NO Node 'SC-2' left the cluster Mar 26 14:58:30 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting remote node in the absence of PLM is outside the scope of OpenSAF Mar 26 14:58:54 SLES-64BIT-SLOT1 kernel: [15226.674333] TIPC: Established link 1.1.1:eth0-1.1.2:eth0 on network plane A Mar 26 15:00:02 SLES-64BIT-SLOT1 syslog-ng[3261]: Log statistics; dropped='pipe(/dev/xconsole)=0', dropped='pipe(/dev/tty10)=0', processed='center(queued)=2197', processed='center(received)=1172', processed='destination(messages)=1172', processed='destination(mailinfo)=0', processed='destination(mailwarn)=0', processed='destination(localmessages)=955', processed='destination(newserr)=0', processed='destination(mailerr)=0', processed='destination(netmgm)=0', processed='destination(warn)=44', processed='destination(console)=13', processed='destination(null)=0', processed='destination(mail)=0', processed='destination(xconsole)=13', processed='destination(firewall)=0', processed='destination(acpid)=0', processed='destination(newscrit)=0', processed='destination(newsnotice)=0', processed='source(src)=1172' Mar 26 15:00:07 SLES-64BIT-SLOT1 osafimmloadd: ER Too many TRY_AGAIN on saImmOmSearchNext - aborting Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: ER SYNC APPARENTLY FAILED status:1 Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO -SERVER STATE: IMM_SERVER_SYNC_SERVER -- IMM_SERVER_READY Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO NODE STATE- IMM_NODE_FULLY_AVAILABLE (2484) Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO Epoch set to 12 in ImmModel Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO Coord broadcasting ABORT_SYNC, epoch:12 Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmpbed: NO Update epoch 12 committing with ccbId:10054/4294967380 Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO SU failover probation timer started (timeout:
[tickets] [opensaf:tickets] Re: #1291 IMM: IMMD healthcheck callback timeout when standby controller rebooted in middle of IMMND sync
Ok but then the question simply becomes why does the healthcheck callback not reach the IMMND or why does the IMMND reply not reach the AMFND ? /AndersBj From: Sirisha Alla [mailto:al...@users.sf.net] Sent: den 19 augusti 2015 10:50 To: [opensaf:tickets] Subject: [opensaf:tickets] #1291 IMM: IMMD healthcheck callback timeout when standby controller rebooted in middle of IMMND sync This issue is reproduced on changeset 6744. Syslog as follows: Aug 19 11:54:13 SLES-64BIT-SLOT1 osafimmnd[5969]: NO implementer for class 'SaSmfSwBundle' is safSmfService = class extent is safe. Aug 19 11:54:13 SLES-64BIT-SLOT1 osafamfnd[6054]: NO Assigned 'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-1,safSg=2N,safApp=OpenSAF' Aug 19 11:54:13 SLES-64BIT-SLOT1 opensafd: OpenSAF(4.7.M0 - ) services successfully started Aug 19 11:54:14 SLES-64BIT-SLOT1 osafimmd[5958]: NO Successfully announced dump at node 2010f. New Epoch:27 .. Aug 19 12:00:12 SLES-64BIT-SLOT1 kernel: [ 4223.945761] TIPC: Established link 1.1.1:eth0-1.1.2:eth0 on network plane A Aug 19 12:00:13 SLES-64BIT-SLOT1 osafimmd[5958]: NO New IMMND process is on STANDBY Controller at 2020f Aug 19 12:00:13 SLES-64BIT-SLOT1 osafimmd[5958]: NO Extended intro from node 2020f Aug 19 12:00:13 SLES-64BIT-SLOT1 osafimmd[5958]: WA IMMND on controller (not currently coord) requests sync Aug 19 12:00:13 SLES-64BIT-SLOT1 osafimmd[5958]: NO Node 2020f request sync sync-pid:5221 epoch:0 Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Announce sync, epoch:30 Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmnd[5969]: NO SERVER STATE: IMM_SERVER_READY -- IMM_SERVER_SYNC_SERVER Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmnd[5969]: NO NODE STATE- IMM_NODE_R_AVAILABLE Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmd[5958]: NO Successfully announced sync. New ruling epoch:30 Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmloadd: logtrace: trace enabled to file /var/log/opensaf/osafimmnd, mask=0x Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmloadd: NO Sync starting Aug 19 12:00:15 SLES-64BIT-SLOT1 osafamfd[6044]: NO Node 'PL-3' left the cluster Aug 19 12:00:15 SLES-64BIT-SLOT1 osafclmd[6025]: NO Node 131855 went down. Not sending track callback for agents on that node Aug 19 12:00:15 SLES-64BIT-SLOT1 osafclmd[6025]: NO Node 131855 went down. Not sending track callback for agents on that node Aug 19 12:00:15 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Global discard node received for nodeId:2030f pid:16584 Aug 19 12:00:15 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Implementer disconnected 15 0, 2030f(down) (MsgQueueService131855) Aug 19 12:00:20 SLES-64BIT-SLOT1 kernel: [ 4231.876089] TIPC: Resetting link 1.1.1:eth0-1.1.3:eth0, peer not responding Aug 19 12:00:20 SLES-64BIT-SLOT1 kernel: [ 4231.876098] TIPC: Lost link 1.1.1:eth0-1.1.3:eth0 on network plane A Aug 19 12:00:20 SLES-64BIT-SLOT1 kernel: [ 4231.877196] TIPC: Lost contact with 1.1.3 Aug 19 12:00:46 SLES-64BIT-SLOT1 kernel: [ 4257.206593] TIPC: Established link 1.1.1:eth0-1.1.3:eth0 on network plane A Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmloadd: ER Too many TRY_AGAIN on saImmOmSearchNext - aborting Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: ER SYNC APPARENTLY FAILED status:1 Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: NO -SERVER STATE: IMM_SERVER_SYNC_SERVER -- IMM_SERVER_READY Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: NO NODE STATE- IMM_NODE_FULLY_AVAILABLE (2484) Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Epoch set to 30 in ImmModel Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Coord broadcasting ABORT_SYNC, epoch:30 Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmpbed: NO Update epoch 30 committing with ccbId:10006/4294967302 Aug 19 12:03:50 SLES-64BIT-SLOT1 kernel: [ 4441.964128] TIPC: Resetting link 1.1.1:eth0-1.1.3:eth0, peer not responding Aug 19 12:03:50 SLES-64BIT-SLOT1 kernel: [ 4441.964145] TIPC: Lost link 1.1.1:eth0-1.1.3:eth0 on network plane A Aug 19 12:03:50 SLES-64BIT-SLOT1 kernel: [ 4441.964157] TIPC: Lost contact with 1.1.3 Aug 19 12:04:28 SLES-64BIT-SLOT1 osafimmnd[5969]: WA PBE process 5994 appears stuck on runtime data handling - sending SIGTERM Aug 19 12:04:28 SLES-64BIT-SLOT1 osafimmpbed: NO IMM PBE received SIG_TERM, closing db handle Aug 19 12:04:28 SLES-64BIT-SLOT1 osafimmpbed: IN IMM PBE process EXITING... Aug 19 12:04:28 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Implementer locally disconnected. Marking it as doomed 11 316, 2010f (OpenSafImmPBE) Aug 19 12:04:29 SLES-64BIT-SLOT1 osafimmnd[5969]: WA Persistent back-end process has apparently died. Aug 19 12:04:29 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Coord broadcasting PBE_PRTO_PURGE_MUTATIONS, epoch:30 Aug 19 12:04:29 SLES-64BIT-SLOT1 osafimmnd[5969]: NO ImmModel::getPbeOi reports missing PbeOi locally = unsafe Aug 19 12:04:29 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Coord broadcasting PBE_PRTO_PURGE_MUTATIONS, epoch:30 Aug 19 12:04:30 SLES-64BIT-SLOT1 osafimmnd[5969]: NO ImmModel::getPbeOi reports missing PbeOi
[tickets] [opensaf:tickets] #1291 IMM: IMMD healthcheck callback timeout when standby controller rebooted in middle of IMMND sync
The IMMD sends one fevs message at a time for each poll cycle. Also in each poll cycle it checks the AMF descriptor for healthcheck callbacks. This means that the IMMD is blocked for more than 3 minutes on broadcasting one fevs message. The IMMSV_DEFAULT_FEVS_MAX_PENDING) affects the IMMND process, not the IMMD. The FEVS_MAX_PENDING is there precisely not to overload the IMMD. --- ** [tickets:#1291] IMM: IMMD healthcheck callback timeout when standby controller rebooted in middle of IMMND sync** **Status:** unassigned **Milestone:** 4.5.2 **Created:** Mon Mar 30, 2015 07:21 AM UTC by Sirisha Alla **Last Updated:** Wed Aug 19, 2015 09:24 AM UTC **Owner:** Neelakanta Reddy **Attachments:** - [immlogs.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1291/attachment/immlogs.tar.bz2) (6.8 MB; application/x-bzip) The issue is observed with 4.6 FC changeset 6377. The system is up and running with single pbe and 50k objects. This issue is seen after http://sourceforge.net/p/opensaf/tickets/1290 is observed. IMM application is running on standby controller and immcfg command is run from payload to set CompRestartMax value to 1000. IMMND is killed twice on standby controller leading to #1290. As a result, standby controller left the cluster in middle of sync, IMMD reported healthcheck callback timeout and the active controller too went for reboot. Following is the syslog of SC-1: Mar 26 14:58:17 SLES-64BIT-SLOT1 osafimmloadd: NO Sync starting Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: NO Node Down event for node id 2020f: Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: NO Current role: ACTIVE Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: Received Node Down for peer controller, OwnNodeId = 131343, SupervisionTime = 60 Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.412080] TIPC: Resetting link 1.1.1:eth0-1.1.2:eth0, peer not responding Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.412089] TIPC: Lost link 1.1.1:eth0-1.1.2:eth0 on network plane A Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.413191] TIPC: Lost contact with 1.1.2 Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:30 SLES-64BIT-SLOT1 osafamfd[9628]: NO Node 'SC-2' left the cluster Mar 26 14:58:30 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting remote node in the absence of PLM is outside the scope of OpenSAF Mar 26 14:58:54 SLES-64BIT-SLOT1 kernel: [15226.674333] TIPC: Established link 1.1.1:eth0-1.1.2:eth0 on network plane A Mar 26 15:00:02 SLES-64BIT-SLOT1 syslog-ng[3261]: Log statistics; dropped='pipe(/dev/xconsole)=0', dropped='pipe(/dev/tty10)=0', processed='center(queued)=2197', processed='center(received)=1172', processed='destination(messages)=1172', processed='destination(mailinfo)=0', processed='destination(mailwarn)=0', processed='destination(localmessages)=955', processed='destination(newserr)=0', processed='destination(mailerr)=0', processed='destination(netmgm)=0', processed='destination(warn)=44', processed='destination(console)=13', processed='destination(null)=0', processed='destination(mail)=0', processed='destination(xconsole)=13', processed='destination(firewall)=0', processed='destination(acpid)=0', processed='destination(newscrit)=0', processed='destination(newsnotice)=0', processed='source(src)=1172' Mar 26 15:00:07 SLES-64BIT-SLOT1 osafimmloadd: ER Too many TRY_AGAIN on saImmOmSearchNext - aborting Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: ER SYNC APPARENTLY FAILED status:1 Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO -SERVER STATE: IMM_SERVER_SYNC_SERVER -- IMM_SERVER_READY Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO NODE STATE- IMM_NODE_FULLY_AVAILABLE (2484) Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO Epoch set to 12 in ImmModel Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO Coord broadcasting ABORT_SYNC, epoch:12 Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmpbed: NO Update epoch 12 committing with ccbId:10054/4294967380 Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO SU failover probation timer started (timeout: 12000 ns) Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO Performing failover of 'safSu=SC-1,safSg=2N,safApp=OpenSAF' (SU failover count:
[tickets] [opensaf:tickets] Re: #1291 IMM: IMMD healthcheck callback timeout when standby controller rebooted in middle of IMMND sync
Changeset 6744 is generated today. So I assume this means you reproduced this today. The IMMND main poll handling processes in sequence on each descriptor, so it should not be possible For traffic on one descriptor to starve out a job on another. /AndersBj From: Anders Bjornerstedt [mailto:ander...@users.sf.net] Sent: den 19 augusti 2015 10:54 To: [opensaf:tickets] Subject: [opensaf:tickets] Re: #1291 IMM: IMMD healthcheck callback timeout when standby controller rebooted in middle of IMMND sync Ok but then the question simply becomes why does the healthcheck callback not reach the IMMND or why does the IMMND reply not reach the AMFND ? /AndersBj From: Sirisha Alla [mailto:al...@users.sf.net] Sent: den 19 augusti 2015 10:50 To: [opensaf:tickets] Subject: [opensaf:tickets] #1291 IMM: IMMD healthcheck callback timeout when standby controller rebooted in middle of IMMND sync This issue is reproduced on changeset 6744. Syslog as follows: Aug 19 11:54:13 SLES-64BIT-SLOT1 osafimmnd[5969]: NO implementer for class 'SaSmfSwBundle' is safSmfService = class extent is safe. Aug 19 11:54:13 SLES-64BIT-SLOT1 osafamfnd[6054]: NO Assigned 'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-1,safSg=2N,safApp=OpenSAF' Aug 19 11:54:13 SLES-64BIT-SLOT1 opensafd: OpenSAF(4.7.M0 - ) services successfully started Aug 19 11:54:14 SLES-64BIT-SLOT1 osafimmd[5958]: NO Successfully announced dump at node 2010f. New Epoch:27 .. Aug 19 12:00:12 SLES-64BIT-SLOT1 kernel: [ 4223.945761] TIPC: Established link 1.1.1:eth0-1.1.2:eth0 on network plane A Aug 19 12:00:13 SLES-64BIT-SLOT1 osafimmd[5958]: NO New IMMND process is on STANDBY Controller at 2020f Aug 19 12:00:13 SLES-64BIT-SLOT1 osafimmd[5958]: NO Extended intro from node 2020f Aug 19 12:00:13 SLES-64BIT-SLOT1 osafimmd[5958]: WA IMMND on controller (not currently coord) requests sync Aug 19 12:00:13 SLES-64BIT-SLOT1 osafimmd[5958]: NO Node 2020f request sync sync-pid:5221 epoch:0 Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Announce sync, epoch:30 Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmnd[5969]: NO SERVER STATE: IMM_SERVER_READY -- IMM_SERVER_SYNC_SERVER Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmnd[5969]: NO NODE STATE- IMM_NODE_R_AVAILABLE Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmd[5958]: NO Successfully announced sync. New ruling epoch:30 Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmloadd: logtrace: trace enabled to file /var/log/opensaf/osafimmnd, mask=0x Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmloadd: NO Sync starting Aug 19 12:00:15 SLES-64BIT-SLOT1 osafamfd[6044]: NO Node 'PL-3' left the cluster Aug 19 12:00:15 SLES-64BIT-SLOT1 osafclmd[6025]: NO Node 131855 went down. Not sending track callback for agents on that node Aug 19 12:00:15 SLES-64BIT-SLOT1 osafclmd[6025]: NO Node 131855 went down. Not sending track callback for agents on that node Aug 19 12:00:15 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Global discard node received for nodeId:2030f pid:16584 Aug 19 12:00:15 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Implementer disconnected 15 0, 2030f(down) (MsgQueueService131855) Aug 19 12:00:20 SLES-64BIT-SLOT1 kernel: [ 4231.876089] TIPC: Resetting link 1.1.1:eth0-1.1.3:eth0, peer not responding Aug 19 12:00:20 SLES-64BIT-SLOT1 kernel: [ 4231.876098] TIPC: Lost link 1.1.1:eth0-1.1.3:eth0 on network plane A Aug 19 12:00:20 SLES-64BIT-SLOT1 kernel: [ 4231.877196] TIPC: Lost contact with 1.1.3 Aug 19 12:00:46 SLES-64BIT-SLOT1 kernel: [ 4257.206593] TIPC: Established link 1.1.1:eth0-1.1.3:eth0 on network plane A Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmloadd: ER Too many TRY_AGAIN on saImmOmSearchNext - aborting Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: ER SYNC APPARENTLY FAILED status:1 Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: NO -SERVER STATE: IMM_SERVER_SYNC_SERVER -- IMM_SERVER_READY Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: NO NODE STATE- IMM_NODE_FULLY_AVAILABLE (2484) Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Epoch set to 30 in ImmModel Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Coord broadcasting ABORT_SYNC, epoch:30 Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmpbed: NO Update epoch 30 committing with ccbId:10006/4294967302 Aug 19 12:03:50 SLES-64BIT-SLOT1 kernel: [ 4441.964128] TIPC: Resetting link 1.1.1:eth0-1.1.3:eth0, peer not responding Aug 19 12:03:50 SLES-64BIT-SLOT1 kernel: [ 4441.964145] TIPC: Lost link 1.1.1:eth0-1.1.3:eth0 on network plane A Aug 19 12:03:50 SLES-64BIT-SLOT1 kernel: [ 4441.964157] TIPC: Lost contact with 1.1.3 Aug 19 12:04:28 SLES-64BIT-SLOT1 osafimmnd[5969]: WA PBE process 5994 appears stuck on runtime data handling - sending SIGTERM Aug 19 12:04:28 SLES-64BIT-SLOT1 osafimmpbed: NO IMM PBE received SIG_TERM, closing db handle Aug 19 12:04:28 SLES-64BIT-SLOT1 osafimmpbed: IN IMM PBE process EXITING... Aug 19 12:04:28 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Implementer locally disconnected. Marking it as doomed 11 316, 2010f (OpenSafImmPBE) Aug 19 12:04:29
[tickets] [opensaf:tickets] Re: #1291 IMM: IMMD healthcheck callback timeout when standby controller rebooted in middle of IMMND sync
Please reproduce withe IMMD trace on. /AndersBj From: Sirisha Alla [mailto:sirisha.a...@oracle.com] Sent: den 19 augusti 2015 11:07 To: [opensaf:tickets]; opensaf-tickets@lists.sourceforge.net Subject: Re: [tickets] [opensaf:tickets] Re: #1291 IMM: IMMD healthcheck callback timeout when standby controller rebooted in middle of IMMND sync Yes, I tried this today. The healthcheck timeout happened on IMMD not on IMMND. /Sirisha On Wednesday 19 August 2015 02:28 PM, Anders Bjornerstedt wrote: Changeset 6744 is generated today. So I assume this means you reproduced this today. The IMMND main poll handling processes in sequence on each descriptor, so it should not be possible For traffic on one descriptor to starve out a job on another. /AndersBj From: Anders Bjornerstedt [mailto:ander...@users.sf.net] Sent: den 19 augusti 2015 10:54 To: [opensaf:tickets] Subject: [opensaf:tickets] Re: #1291 IMM: IMMD healthcheck callback timeout when standby controller rebooted in middle of IMMND sync Ok but then the question simply becomes why does the healthcheck callback not reach the IMMND or why does the IMMND reply not reach the AMFND ? /AndersBj From: Sirisha Alla [mailto:al...@users.sf.net] Sent: den 19 augusti 2015 10:50 To: [opensaf:tickets] Subject: [opensaf:tickets] #1291 IMM: IMMD healthcheck callback timeout when standby controller rebooted in middle of IMMND sync This issue is reproduced on changeset 6744. Syslog as follows: Aug 19 11:54:13 SLES-64BIT-SLOT1 osafimmnd[5969]: NO implementer for class 'SaSmfSwBundle' is safSmfService = class extent is safe. Aug 19 11:54:13 SLES-64BIT-SLOT1 osafamfnd[6054]: NO Assigned 'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-1,safSg=2N,safApp=OpenSAF' Aug 19 11:54:13 SLES-64BIT-SLOT1 opensafd: OpenSAF(4.7.M0 - ) services successfully started Aug 19 11:54:14 SLES-64BIT-SLOT1 osafimmd[5958]: NO Successfully announced dump at node 2010f. New Epoch:27 .. Aug 19 12:00:12 SLES-64BIT-SLOT1 kernel: [ 4223.945761] TIPC: Established link 1.1.1:eth0-1.1.2:eth0 on network plane A Aug 19 12:00:13 SLES-64BIT-SLOT1 osafimmd[5958]: NO New IMMND process is on STANDBY Controller at 2020f Aug 19 12:00:13 SLES-64BIT-SLOT1 osafimmd[5958]: NO Extended intro from node 2020f Aug 19 12:00:13 SLES-64BIT-SLOT1 osafimmd[5958]: WA IMMND on controller (not currently coord) requests sync Aug 19 12:00:13 SLES-64BIT-SLOT1 osafimmd[5958]: NO Node 2020f request sync sync-pid:5221 epoch:0 Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Announce sync, epoch:30 Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmnd[5969]: NO SERVER STATE: IMM_SERVER_READY -- IMM_SERVER_SYNC_SERVER Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmnd[5969]: NO NODE STATE- IMM_NODE_R_AVAILABLE Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmd[5958]: NO Successfully announced sync. New ruling epoch:30 Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmloadd: logtrace: trace enabled to file /var/log/opensaf/osafimmnd, mask=0x Aug 19 12:00:14 SLES-64BIT-SLOT1 osafimmloadd: NO Sync starting Aug 19 12:00:15 SLES-64BIT-SLOT1 osafamfd[6044]: NO Node 'PL-3' left the cluster Aug 19 12:00:15 SLES-64BIT-SLOT1 osafclmd[6025]: NO Node 131855 went down. Not sending track callback for agents on that node Aug 19 12:00:15 SLES-64BIT-SLOT1 osafclmd[6025]: NO Node 131855 went down. Not sending track callback for agents on that node Aug 19 12:00:15 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Global discard node received for nodeId:2030f pid:16584 Aug 19 12:00:15 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Implementer disconnected 15 0, 2030f(down) (MsgQueueService131855) Aug 19 12:00:20 SLES-64BIT-SLOT1 kernel: [ 4231.876089] TIPC: Resetting link 1.1.1:eth0-1.1.3:eth0, peer not responding Aug 19 12:00:20 SLES-64BIT-SLOT1 kernel: [ 4231.876098] TIPC: Lost link 1.1.1:eth0-1.1.3:eth0 on network plane A Aug 19 12:00:20 SLES-64BIT-SLOT1 kernel: [ 4231.877196] TIPC: Lost contact with 1.1.3 Aug 19 12:00:46 SLES-64BIT-SLOT1 kernel: [ 4257.206593] TIPC: Established link 1.1.1:eth0-1.1.3:eth0 on network plane A Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmloadd: ER Too many TRY_AGAIN on saImmOmSearchNext - aborting Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: ER SYNC APPARENTLY FAILED status:1 Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: NO -SERVER STATE: IMM_SERVER_SYNC_SERVER -- IMM_SERVER_READY Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: NO NODE STATE- IMM_NODE_FULLY_AVAILABLE (2484) Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Epoch set to 30 in ImmModel Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmnd[5969]: NO Coord broadcasting ABORT_SYNC, epoch:30 Aug 19 12:01:58 SLES-64BIT-SLOT1 osafimmpbed: NO Update epoch 30 committing with ccbId:10006/4294967302 Aug 19 12:03:50 SLES-64BIT-SLOT1 kernel: [ 4441.964128] TIPC: Resetting link 1.1.1:eth0-1.1.3:eth0, peer not responding Aug 19 12:03:50 SLES-64BIT-SLOT1 kernel: [ 4441.964145] TIPC: Lost link 1.1.1:eth0-1.1.3:eth0 on network plane A Aug 19 12:03:50 SLES-64BIT-SLOT1 kernel
[tickets] [opensaf:tickets] #1291 IMM: IMMD healthcheck callback timeout when standby controller rebooted in middle of IMMND sync
- **Comment**: I propose that we increase the saAmfHctDefMaxDuration value from 3 minutes to 5 minutes in: safHealthcheckKey=Default,safVersion=4.0.0,safCompType=OpenSafCompTypeIMMND This is the only thing that can be done in the IMMSv. The other alternatives are: (1) Place the ticket on MDS (since IMMND is could only be blocked on MDS asynchronous send). Yes asynchronous send. It myst be bocked in the MDS library processing of packing a huge message /(sync buffer ?) for asynchronous send. Stuck on MDS for 3 minutes. (2) Clöose the ticket. --- ** [tickets:#1291] IMM: IMMD healthcheck callback timeout when standby controller rebooted in middle of IMMND sync** **Status:** assigned **Milestone:** 4.5.2 **Created:** Mon Mar 30, 2015 07:21 AM UTC by Sirisha Alla **Last Updated:** Fri Aug 14, 2015 07:45 AM UTC **Owner:** Neelakanta Reddy **Attachments:** - [immlogs.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1291/attachment/immlogs.tar.bz2) (6.8 MB; application/x-bzip) The issue is observed with 4.6 FC changeset 6377. The system is up and running with single pbe and 50k objects. This issue is seen after http://sourceforge.net/p/opensaf/tickets/1290 is observed. IMM application is running on standby controller and immcfg command is run from payload to set CompRestartMax value to 1000. IMMND is killed twice on standby controller leading to #1290. As a result, standby controller left the cluster in middle of sync, IMMD reported healthcheck callback timeout and the active controller too went for reboot. Following is the syslog of SC-1: Mar 26 14:58:17 SLES-64BIT-SLOT1 osafimmloadd: NO Sync starting Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: NO Node Down event for node id 2020f: Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: NO Current role: ACTIVE Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: Received Node Down for peer controller, OwnNodeId = 131343, SupervisionTime = 60 Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.412080] TIPC: Resetting link 1.1.1:eth0-1.1.2:eth0, peer not responding Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.412089] TIPC: Lost link 1.1.1:eth0-1.1.2:eth0 on network plane A Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.413191] TIPC: Lost contact with 1.1.2 Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:30 SLES-64BIT-SLOT1 osafamfd[9628]: NO Node 'SC-2' left the cluster Mar 26 14:58:30 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting remote node in the absence of PLM is outside the scope of OpenSAF Mar 26 14:58:54 SLES-64BIT-SLOT1 kernel: [15226.674333] TIPC: Established link 1.1.1:eth0-1.1.2:eth0 on network plane A Mar 26 15:00:02 SLES-64BIT-SLOT1 syslog-ng[3261]: Log statistics; dropped='pipe(/dev/xconsole)=0', dropped='pipe(/dev/tty10)=0', processed='center(queued)=2197', processed='center(received)=1172', processed='destination(messages)=1172', processed='destination(mailinfo)=0', processed='destination(mailwarn)=0', processed='destination(localmessages)=955', processed='destination(newserr)=0', processed='destination(mailerr)=0', processed='destination(netmgm)=0', processed='destination(warn)=44', processed='destination(console)=13', processed='destination(null)=0', processed='destination(mail)=0', processed='destination(xconsole)=13', processed='destination(firewall)=0', processed='destination(acpid)=0', processed='destination(newscrit)=0', processed='destination(newsnotice)=0', processed='source(src)=1172' Mar 26 15:00:07 SLES-64BIT-SLOT1 osafimmloadd: ER Too many TRY_AGAIN on saImmOmSearchNext - aborting Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: ER SYNC APPARENTLY FAILED status:1 Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO -SERVER STATE: IMM_SERVER_SYNC_SERVER -- IMM_SERVER_READY Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO NODE STATE- IMM_NODE_FULLY_AVAILABLE (2484) Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO Epoch set to 12 in ImmModel Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO Coord broadcasting ABORT_SYNC, epoch:12 Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmpbed: NO Update epoch 12 committing with ccbId:10054/4294967380 Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO SU failover probation
[tickets] [opensaf:tickets] Re: #1452 LOG: Use root name when searching for stream objects
Possibly yes. You could look at it this way. Every application is free to perform unnecessarily inefficient searches. A global search actually causes the local IMMND to copy the entire database. In this case the LOg subtree probably contains less than 10 objects. In general yes optimizations are enhancements and not defects. Now we have the situation that we in OpenSAF 4.5 introduced support for long DNs and this is enabled by a config attribute in the immsv. So suddenly what used to be simply a gross inefficiency in LOGSv has now also become a problem for users that want to deploy with long DNs but still want the LOG service configured. There exists no other suitale solutiion as I see it. There was talk of filtering but an implicit filter does not work since it implicitly changes the semantics of a search. An explicit filter would be possible since then the application at least is saying that I am prepared to receive a partial result for the query. This works if the appliaction somehow knows that all objects with logn DNs are not relevant for it. In general that is a dangerous assumption. But adding an explicit filter to the immsv is also an enhancement and quite frankly more work than the fix for the LOGsv to just scope its search appropriately to its own root object. /AndersBj On 08/17/2015 11:17 AM, Mathi Naickan wrote: I guess we are not going round and round but probably just another iteration on this topic! ;-) I think this is not about any particular IMM user, Note that there are more services that are performing this kind of 'searching from root' thing. The next question is what happens to the applications that have been performing such search? * Mathi. *[tickets:#1452] http://sourceforge.net/p/opensaf/tickets/1452/ LOG: Use root name when searching for stream objects* *Status:* review *Milestone:* 4.7-Tentative *Created:* Fri Aug 14, 2015 12:34 PM UTC by elunlen *Last Updated:* Mon Aug 17, 2015 07:47 AM UTC *Owner:* elunlen At startup the log server searches for stream configuration objects. The search is done with no root object defined (NULL pointer for rootName in parameter). Search root should be safApp=safLogService. Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/1452/ To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/ --- ** [tickets:#1452] LOG: Use root name when searching for stream objects** **Status:** review **Milestone:** 4.7-Tentative **Created:** Fri Aug 14, 2015 12:34 PM UTC by elunlen **Last Updated:** Mon Aug 17, 2015 09:17 AM UTC **Owner:** elunlen At startup the log server searches for stream configuration objects. The search is done with no root object defined (NULL pointer for rootName in parameter). Search root should be safApp=safLogService. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to http://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at http://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] Re: #1452 LOG: Use root name when searching for stream objects
If we are to apply it to more than the latest branch then we must declare it as a defect. We don't want to keep the ticket state signaling clean. In general you don't want enhancements on old releases because every change of behavior has risk associated with it. By only doing enhancements on the latest branch we keep the number of surprises down on the older branches. So instead of starting to have exception on enhancements handling, we need to declare it as a defect to be applied on the latest 3 branches. Long DNs was introduced in 4.5- /AndersBj From: elunlen [mailto:elun...@users.sf.net] Sent: den 17 augusti 2015 12:52 To: [opensaf:tickets] Subject: [opensaf:tickets] Re: #1452 LOG: Use root name when searching for stream objects Maybe this an enhancement but what is the problem to apply this change to all active branches anyway it's the most practical thing to do regardless of any changes of handling long DNs that may be done in the future. This will not change any behavior of the log service except that it maybe will start a little bit faster and use less resources. /Lennart From: Anders Bjornerstedt [mailto:ander...@users.sf.net] Sent: den 17 augusti 2015 12:10 To: [opensaf:tickets] Subject: [opensaf:tickets] Re: #1452 LOG: Use root name when searching for stream objects Possibly yes. You could look at it this way. Every application is free to perform unnecessarily inefficient searches. A global search actually causes the local IMMND to copy the entire database. In this case the LOg subtree probably contains less than 10 objects. In general yes optimizations are enhancements and not defects. Now we have the situation that we in OpenSAF 4.5 introduced support for long DNs and this is enabled by a config attribute in the immsv. So suddenly what used to be simply a gross inefficiency in LOGSv has now also become a problem for users that want to deploy with long DNs but still want the LOG service configured. There exists no other suitale solutiion as I see it. There was talk of filtering but an implicit filter does not work since it implicitly changes the semantics of a search. An explicit filter would be possible since then the application at least is saying that I am prepared to receive a partial result for the query. This works if the appliaction somehow knows that all objects with logn DNs are not relevant for it. In general that is a dangerous assumption. But adding an explicit filter to the immsv is also an enhancement and quite frankly more work than the fix for the LOGsv to just scope its search appropriately to its own root object. /AndersBj On 08/17/2015 11:17 AM, Mathi Naickan wrote: I guess we are not going round and round but probably just another iteration on this topic! ;-) I think this is not about any particular IMM user, Note that there are more services that are performing this kind of 'searching from root' thing. The next question is what happens to the applications that have been performing such search? * Mathi. [tickets:#1452]http://sourceforge.net/p/opensaf/tickets/1452/http://sourceforge.net/p/opensaf/tickets/1452/ http://sourceforge.net/p/opensaf/tickets/1452/ LOG: Use root name when searching for stream objects Status: review Milestone: 4.7-Tentative Created: Fri Aug 14, 2015 12:34 PM UTC by elunlen Last Updated: Mon Aug 17, 2015 07:47 AM UTC Owner: elunlen At startup the log server searches for stream configuration objects. The search is done with no root object defined (NULL pointer for rootName in parameter). Search root should be safApp=safLogService. Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/1452/ To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/ [tickets:#1452]http://sourceforge.net/p/opensaf/tickets/1452/http://sourceforge.net/p/opensaf/tickets/1452/ LOG: Use root name when searching for stream objects Status: review Milestone: 4.7-Tentative Created: Fri Aug 14, 2015 12:34 PM UTC by elunlen Last Updated: Mon Aug 17, 2015 09:17 AM UTC Owner: elunlen At startup the log server searches for stream configuration objects. The search is done with no root object defined (NULL pointer for rootName in parameter). Search root should be safApp=safLogService. Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/1452/ To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/ [tickets:#1452]http://sourceforge.net/p/opensaf/tickets/1452/ LOG: Use root name when searching for stream objects Status: review Milestone: 4.7-Tentative Created: Fri Aug 14, 2015 12:34 PM UTC by elunlen Last Updated: Mon Aug 17, 2015 09:17 AM UTC Owner: elunlen At startup the log server searches
[tickets] [opensaf:tickets] Re: #1452 LOG: Use root name when searching for stream objects
Yes in my opinion. But maybe we need a vote :) /AndersBj From: elunlen [mailto:elun...@users.sf.net] Sent: den 17 augusti 2015 15:04 To: [opensaf:tickets] Subject: [opensaf:tickets] #1452 LOG: Use root name when searching for stream objects Is it Ok then to push this fix to all active branches and keep this ticket a defect ticket? [tickets:#1452]http://sourceforge.net/p/opensaf/tickets/1452/ LOG: Use root name when searching for stream objects Status: review Milestone: 4.7-Tentative Created: Fri Aug 14, 2015 12:34 PM UTC by elunlen Last Updated: Mon Aug 17, 2015 12:10 PM UTC Owner: elunlen At startup the log server searches for stream configuration objects. The search is done with no root object defined (NULL pointer for rootName in parameter). Search root should be safApp=safLogService. Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/1452/ To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/ --- ** [tickets:#1452] LOG: Use root name when searching for stream objects** **Status:** review **Milestone:** 4.7-Tentative **Created:** Fri Aug 14, 2015 12:34 PM UTC by elunlen **Last Updated:** Mon Aug 17, 2015 01:03 PM UTC **Owner:** elunlen At startup the log server searches for stream configuration objects. The search is done with no root object defined (NULL pointer for rootName in parameter). Search root should be safApp=safLogService. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to http://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at http://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1291 IMM: IMMD healthcheck callback timeout when standby controller rebooted in middle of IMMND sync
I suggest we close this defect ticket as not reproducible. That is unless someone can reproduce it. My best guess is that this is yet another side effect of testing an overloaded system. Since we have no load regulation mechanism and no overload protection mechanism, it is relatively easy to push the system until it starts to break down. This is what is hapening here. The IMMND misses a timeloy response on a helathchek heartbeat. Such a heartbeat timeout I expect (hope) is set to 2 or 3 minutes. The *only* reason for the healthcheck existence is to detect and restart a hung/looping process. If a process is hunbg or looping it will be so indefinitely. So we dont want false positives shooting down the service just because the system is temporarily overloaded. This just adds more load. If a process has not repsonded in a few minutes then we really assume it is hung. The assumption here is also that a really hung process is a very rare kind of problem. This assumption is correct relative tho the IMMND because the IMMND only blocks on calls to MDS and (ironically) on some syncronous AMF calls. --- ** [tickets:#1291] IMM: IMMD healthcheck callback timeout when standby controller rebooted in middle of IMMND sync** **Status:** assigned **Milestone:** 4.5.2 **Created:** Mon Mar 30, 2015 07:21 AM UTC by Sirisha Alla **Last Updated:** Fri Apr 10, 2015 01:09 PM UTC **Owner:** Neelakanta Reddy **Attachments:** - [immlogs.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1291/attachment/immlogs.tar.bz2) (6.8 MB; application/x-bzip) The issue is observed with 4.6 FC changeset 6377. The system is up and running with single pbe and 50k objects. This issue is seen after http://sourceforge.net/p/opensaf/tickets/1290 is observed. IMM application is running on standby controller and immcfg command is run from payload to set CompRestartMax value to 1000. IMMND is killed twice on standby controller leading to #1290. As a result, standby controller left the cluster in middle of sync, IMMD reported healthcheck callback timeout and the active controller too went for reboot. Following is the syslog of SC-1: Mar 26 14:58:17 SLES-64BIT-SLOT1 osafimmloadd: NO Sync starting Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: NO Node Down event for node id 2020f: Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: NO Current role: ACTIVE Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: Received Node Down for peer controller, OwnNodeId = 131343, SupervisionTime = 60 Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.412080] TIPC: Resetting link 1.1.1:eth0-1.1.2:eth0, peer not responding Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.412089] TIPC: Lost link 1.1.1:eth0-1.1.2:eth0 on network plane A Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.413191] TIPC: Lost contact with 1.1.2 Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:30 SLES-64BIT-SLOT1 osafamfd[9628]: NO Node 'SC-2' left the cluster Mar 26 14:58:30 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting remote node in the absence of PLM is outside the scope of OpenSAF Mar 26 14:58:54 SLES-64BIT-SLOT1 kernel: [15226.674333] TIPC: Established link 1.1.1:eth0-1.1.2:eth0 on network plane A Mar 26 15:00:02 SLES-64BIT-SLOT1 syslog-ng[3261]: Log statistics; dropped='pipe(/dev/xconsole)=0', dropped='pipe(/dev/tty10)=0', processed='center(queued)=2197', processed='center(received)=1172', processed='destination(messages)=1172', processed='destination(mailinfo)=0', processed='destination(mailwarn)=0', processed='destination(localmessages)=955', processed='destination(newserr)=0', processed='destination(mailerr)=0', processed='destination(netmgm)=0', processed='destination(warn)=44', processed='destination(console)=13', processed='destination(null)=0', processed='destination(mail)=0', processed='destination(xconsole)=13', processed='destination(firewall)=0', processed='destination(acpid)=0', processed='destination(newscrit)=0', processed='destination(newsnotice)=0', processed='source(src)=1172' Mar 26 15:00:07 SLES-64BIT-SLOT1 osafimmloadd: ER Too many TRY_AGAIN on saImmOmSearchNext - aborting Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: ER SYNC APPARENTLY FAILED status:1 Mar 26 15:00:08
[tickets] [opensaf:tickets] #1445 imm: Don't check for pending fevs when updating pure runtime attributes
- **Type**: defect -- enhancement - **Milestone**: 4.5.2 -- 4.7-Tentative - **Comment**: This is an enhancement not a defect. There is no vilation of interface or behavior on the imm part related to this ticket. Any IMM call can result in TRY_AGAIN and particularly calls going remote. This call is no exception. The fact that this call is currently over sensitive to fevs overload is not a defect. Fevs overload occurs only because OpensaF has no overload protection or load regulation mechanism. So the fact that it occurs is itself a problem or a stress test. --- ** [tickets:#1445] imm: Don't check for pending fevs when updating pure runtime attributes** **Status:** review **Milestone:** 4.7-Tentative **Created:** Thu Aug 13, 2015 09:48 AM UTC by Hung Nguyen **Last Updated:** Fri Aug 14, 2015 05:49 AM UTC **Owner:** Neelakanta Reddy When invoking saImmOiRtObjectUpdate(), number of pending fevs messages is always checked on server side and TRY_AGAIN is returned when it reaches IMMSV_DEFAULT_FEVS_MAX_PENDING. If the attributes to be updated are pure runtime attributes, number of pending fevs messages should not be checked because the IMMD_EVT_ND2D_OI_OBJ_MODIFY message wouldn't be sent out to broadcast to other IMMNDs. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1448 smf: Make campaigns less fragile by retrying on ERR_NO_RESOURCES
--- ** [tickets:#1448] smf: Make campaigns less fragile by retrying on ERR_NO_RESOURCES** **Status:** unassigned **Milestone:** future **Created:** Fri Aug 14, 2015 07:09 AM UTC by Anders Bjornerstedt **Last Updated:** Fri Aug 14, 2015 07:09 AM UTC **Owner:** nobody The SMF service is a heavy user of the IMM service. The IMM has an established client pattern for ERR_TRY_AGAIN which allows an application realtime control over how long it is prepared to wait for a transient inability of the IMM service to fullfill a request. Each response of TRY_AGAIN should in itself be fast so the application needs a delay in its retry loop. There is also the very similar error code ERR_NO_RESOURSES. Logically that error code is identical to TRY_AGAIN in that the request could not be accepted due to no fault of the client but due to some more or less temporary problem in the IMM service. The difference is that NO_RESOURCES has no realtime ambitions. Typically this error code is used by the imm when the imm can not fullfill a request due to reasons that are outside of the imm service control. Also the time from request to a response of ERR_NO_RESOUIRCES may be long. The SMF service in general has no realtime requirments. The main goal for the SMF service is to successfully complete correctly formulated camopaings. This means that the SMF service should be programmed to avoid unnecessary fragility related to temporary problems, even if the temporary problem could linger for seconds or minutes. The alternative of aborting the campaign will itself discard potentially large execution times already completed. It may sometimes even result in a system restore. This means that SMF campaigns should have a retry loop that handles not just TRY_AGAIN, but also ERR_NO_RESOURCES where this return code is relevant (can be returned according to the API spec).. The error copde ERR_BUSY also exists and is for all practical purposes identical to ERR_NO_RESOURCES in semantics, both logical and timing. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #58 IMM: IMM internally should create CCB error strings
- **Milestone**: future -- 4.6.1 --- ** [tickets:#58] IMM: IMM internally should create CCB error strings** **Status:** fixed **Milestone:** 4.6.1 **Created:** Wed May 08, 2013 08:35 AM UTC by Anders Bjornerstedt **Last Updated:** Fri Aug 14, 2015 08:04 AM UTC **Owner:** nobody Migrated from: http://devel.opensaf.org/ticket/2712 For example in this case: Jul 3 12:39:55.177799 osafimmnd [17744:ImmModel.cc:5146] T7 ERR_NOT_EXIST: object 'safSmfBundle=XXX' does not have an implementer and flag SA_IMM_CCB_REGISTERED_OI is set --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1449 IMM: CCB interface for probing abort reason (validation error or resource error)
--- ** [tickets:#1449] IMM: CCB interface for probing abort reason (validation error or resource error)** **Status:** unassigned **Milestone:** 4.7-Tentative **Created:** Fri Aug 14, 2015 08:33 AM UTC by Anders Bjornerstedt **Last Updated:** Fri Aug 14, 2015 08:33 AM UTC **Owner:** nobody Suggested interface, closely related to saImmOmCcbGetErrorStrings(): extern SaAisErrorT saImmOmCcbGetAbortReason(SaImmCcbHandleT ccbHandle, SaBoolT* isValidationAbort); Arguments : ccbHandle (in)-The ccb handle. isValidationAbort (out) - SA_TRUE if validation abort otherwise resource abort. Return Values : SA_AIS_ERR_BAD_HANDLE - bad ccb handle. SA_AIS_ERR_INVALID_PARAM - handle is associated with ccb that is NOT aborted. SA_AIS_ERR_VERSION (not using A.2.xx) --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1449 IMM: CCB interface for probing abort reason (validation error or resource error)
- Description has changed: Diff: --- old +++ new @@ -6,6 +6,7 @@ Arguments : ccbHandle (in)-The ccb handle. isValidationAbort (out) - SA_TRUE if validation abort otherwise resource abort. Return Values : + SA_AIS_OK SA_AIS_ERR_BAD_HANDLE - bad ccb handle. SA_AIS_ERR_INVALID_PARAM - handle is associated with ccb that is NOT aborted. SA_AIS_ERR_VERSION (not using A.2.xx) --- ** [tickets:#1449] IMM: CCB interface for probing abort reason (validation error or resource error)** **Status:** unassigned **Milestone:** 4.7-Tentative **Created:** Fri Aug 14, 2015 08:33 AM UTC by Anders Bjornerstedt **Last Updated:** Fri Aug 14, 2015 08:33 AM UTC **Owner:** nobody Suggested interface, closely related to saImmOmCcbGetErrorStrings(): extern SaAisErrorT saImmOmCcbGetAbortReason(SaImmCcbHandleT ccbHandle, SaBoolT* isValidationAbort); Arguments : ccbHandle (in)-The ccb handle. isValidationAbort (out) - SA_TRUE if validation abort otherwise resource abort. Return Values : SA_AIS_OK SA_AIS_ERR_BAD_HANDLE - bad ccb handle. SA_AIS_ERR_INVALID_PARAM - handle is associated with ccb that is NOT aborted. SA_AIS_ERR_VERSION (not using A.2.xx) --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #58 IMM: IMM internally should create CCB error strings
- **status**: unassigned -- fixed - **Comment**: Since some releases back the IMM does generate error strings internally. The above specific example is covered. --- ** [tickets:#58] IMM: IMM internally should create CCB error strings** **Status:** fixed **Milestone:** future **Created:** Wed May 08, 2013 08:35 AM UTC by Anders Bjornerstedt **Last Updated:** Wed May 08, 2013 08:35 AM UTC **Owner:** nobody Migrated from: http://devel.opensaf.org/ticket/2712 For example in this case: Jul 3 12:39:55.177799 osafimmnd [17744:ImmModel.cc:5146] T7 ERR_NOT_EXIST: object 'safSmfBundle=XXX' does not have an implementer and flag SA_IMM_CCB_REGISTERED_OI is set --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #744 IMM: Use error string to classify cause for aborted CCB.
Some work on this has already already been done. It remains to. 1) Survey the imm server CCB abort handling to see if there are cases where error string is missing or the abort cause prefix is missing. 2) Ensure that the prefix is uniform, i.e. amenable to string matching. The ccb abort reason prefix is prepended in to the text so that the abort reason becomes obvious to a human user (e.g. immcfg result). The intention is not to have programmers directly do string matching on the abort prefix. Even if that is technically possible we will provide a wrapper function for this to be used for programmable handling of abort reason. A separate ticket will be created for that. --- ** [tickets:#744] IMM: Use error string to classify cause for aborted CCB.** **Status:** unassigned **Milestone:** future **Created:** Thu Jan 23, 2014 12:17 PM UTC by Anders Bjornerstedt **Last Updated:** Tue Jun 30, 2015 12:30 PM UTC **Owner:** nobody This is a special case of #58 (http://sourceforge.net/p/opensaf/tickets/58). Enhancement #58 is a bit large and open-ended. This ticket focuses on a particular need for complementary information about one error return code. If a CCB related operation returns SA_AIS_ERR_FAILED_OPERATION it means that the CCB has been aborted and the CCB-handle can no longer be used to generate new (chained) CCBs. The cause of the aborted CCB can be classified into two broad mutually exclusive categories: 1) Logical errors related to the CCB buildup/contents. This would primarily be validation errors where an OI rejects a ccb-operation or rejects an apply. 2) Resource problems in the immsv. This could be the need for imm-sync that gets priority over current non-empty CCBs that are not in critical. Or it could be a hung PBE that gets restarted and finds the CCB did not complete the commit, resulting in an abort. Or other reasons in immsv or below. Some applications that have the capability to record an attempted CCB at the application level, may wish to attempt a replay of an aborted CCB, but only if the CCB was aborted due to a cause in category (2). One could refine this to distinguish within (2) between definitely transient resource problems (imm sync) from likely stable resource limits (huge CCBs that fail to commit over PBE). The latter are more likely to repeatedly fail. But such refinement will not be done in this ticket. The idea is to prefix the error string with an identifiable tag of some form. The tag must be documented in the IMMSV README and the IMMSV_PR. This would make it relatively simple for an application developer to write code to match against the initial sub-string. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #744 IMM: Use error string to classify cause for aborted CCB.
- **Milestone**: future -- 4.7-Tentative --- ** [tickets:#744] IMM: Use error string to classify cause for aborted CCB.** **Status:** unassigned **Milestone:** 4.7-Tentative **Created:** Thu Jan 23, 2014 12:17 PM UTC by Anders Bjornerstedt **Last Updated:** Fri Aug 14, 2015 08:13 AM UTC **Owner:** nobody This is a special case of #58 (http://sourceforge.net/p/opensaf/tickets/58). Enhancement #58 is a bit large and open-ended. This ticket focuses on a particular need for complementary information about one error return code. If a CCB related operation returns SA_AIS_ERR_FAILED_OPERATION it means that the CCB has been aborted and the CCB-handle can no longer be used to generate new (chained) CCBs. The cause of the aborted CCB can be classified into two broad mutually exclusive categories: 1) Logical errors related to the CCB buildup/contents. This would primarily be validation errors where an OI rejects a ccb-operation or rejects an apply. 2) Resource problems in the immsv. This could be the need for imm-sync that gets priority over current non-empty CCBs that are not in critical. Or it could be a hung PBE that gets restarted and finds the CCB did not complete the commit, resulting in an abort. Or other reasons in immsv or below. Some applications that have the capability to record an attempted CCB at the application level, may wish to attempt a replay of an aborted CCB, but only if the CCB was aborted due to a cause in category (2). One could refine this to distinguish within (2) between definitely transient resource problems (imm sync) from likely stable resource limits (huge CCBs that fail to commit over PBE). The latter are more likely to repeatedly fail. But such refinement will not be done in this ticket. The idea is to prefix the error string with an identifiable tag of some form. The tag must be documented in the IMMSV README and the IMMSV_PR. This would make it relatively simple for an application developer to write code to match against the initial sub-string. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1313 osaf: opensaf does not start when long dn object is present in imm.db and cluster is reset
- **Comment**: I think this ticket should be closed as invalid. The mechanism works as documented. The only relevant defect related to this incident is #1430 and it has been fixed. https://sourceforge.net/p/opensaf/tickets/1430/ That is an application that does a search with a scope that does *not* include any long DN object should not get hit by any longDn error. --- ** [tickets:#1313] osaf: opensaf does not start when long dn object is present in imm.db and cluster is reset** **Status:** unassigned **Milestone:** 4.5.1 **Created:** Mon Apr 13, 2015 08:57 AM UTC by Sirisha Alla **Last Updated:** Wed Apr 22, 2015 07:05 AM UTC **Owner:** Mathi Naickan **Attachments:** - [slot1.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1313/attachment/slot1.tar.bz2) (269.6 kB; application/x-bzip) This is observed on changeset 6377 (46FC Tag). The system is up with single pbe and 50k objects. Long dns was enabled. There is one long dn object in the cluster. Syslog on SC-1: Apr 9 15:49:14 SLES-64BIT-SLOT1 osafimmnd[10731]: WA Setting attr longDnsAllowed to 0 in opensafImm=opensafImm,safApp=safImmService not allowed when long RDN exists inside object: xattrName_testAdminOwnerClear_SubLevelScope_1011 Now the cluster is reset. Nodes in the cluster fail to come up with the following reason: Apr 13 13:04:55 SLES-64BIT-SLOT1 osafimmnd[3439]: NO Persistent Back End OI attached, pid: 3465 Apr 13 13:04:55 SLES-64BIT-SLOT1 osafimmnd[3439]: NO Implementer connected: 1 (OpenSafImmPBE) 10, 2010f Apr 13 13:04:55 SLES-64BIT-SLOT1 osafimmnd[3439]: NO implementer for class 'OpensafImm' is OpenSafImmPBE = class extent is safe. Apr 13 13:04:55 SLES-64BIT-SLOT1 osafimmpbed: NO Update epoch 20 committing with ccbId:10003/4294967299 Apr 13 13:04:56 SLES-64BIT-SLOT1 osafimmnd[3439]: NO PBE-OI established on this SC. Dumping incrementally to file imm.db Apr 13 13:05:34 SLES-64BIT-SLOT1 opensafd[3378]: ER Timed-out for response from LOGD Apr 13 13:05:34 SLES-64BIT-SLOT1 opensafd[3378]: ER Apr 13 13:05:34 SLES-64BIT-SLOT1 opensafd[3378]: ER Going for recovery Apr 13 13:05:34 SLES-64BIT-SLOT1 opensafd[3378]: ER Trying To RESPAWN /usr/lib64/opensaf/clc-cli/osaf-logd attempt #1 Apr 13 13:05:34 SLES-64BIT-SLOT1 opensafd[3378]: ER Sending SIGKILL to LOGD, pid=3452 Apr 13 13:05:49 SLES-64BIT-SLOT1 osaflogd[3500]: Started Apr 13 13:05:49 SLES-64BIT-SLOT1 osaflogd[3500]: WA read_logsv_configuration(). All attributes could not be read Apr 13 13:05:49 SLES-64BIT-SLOT1 osaflogd[3500]: NO Log config system: high 0 low 0, application: high 0 low 0 Apr 13 13:05:49 SLES-64BIT-SLOT1 osaflogd[3500]: NO log root directory is: /var/log/opensaf/saflog Apr 13 13:05:49 SLES-64BIT-SLOT1 osaflogd[3500]: NO LOG data group is: Apr 13 13:05:49 SLES-64BIT-SLOT1 osaflogd[3500]: NO LGS_MBCSV_VERSION = 4 Apr 13 13:05:49 SLES-64BIT-SLOT1 osaflogd[3500]: saImmOmSearchInitialize FAILED, rc = 13 Apr 13 13:06:29 SLES-64BIT-SLOT1 opensafd[3378]: ER Timed-out for response from LOGD Apr 13 13:06:29 SLES-64BIT-SLOT1 opensafd[3378]: ER Could Not RESPAWN LOGD Apr 13 13:06:29 SLES-64BIT-SLOT1 opensafd[3378]: ER Apr 13 13:06:29 SLES-64BIT-SLOT1 opensafd[3378]: ER Trying To RESPAWN /usr/lib64/opensaf/clc-cli/osaf-logd attempt #2 Apr 13 13:06:29 SLES-64BIT-SLOT1 opensafd[3378]: ER Sending SIGKILL to LOGD, pid=3495 Apr 13 13:06:44 SLES-64BIT-SLOT1 osaflogd[3546]: Started Apr 13 13:06:44 SLES-64BIT-SLOT1 osaflogd[3546]: WA read_logsv_configuration(). All attributes could not be read Apr 13 13:06:44 SLES-64BIT-SLOT1 osaflogd[3546]: NO Log config system: high 0 low 0, application: high 0 low 0 Apr 13 13:06:44 SLES-64BIT-SLOT1 osaflogd[3546]: NO log root directory is: /var/log/opensaf/saflog Apr 13 13:06:44 SLES-64BIT-SLOT1 osaflogd[3546]: NO LOG data group is: Apr 13 13:06:44 SLES-64BIT-SLOT1 osaflogd[3546]: NO LGS_MBCSV_VERSION = 4 Apr 13 13:06:44 SLES-64BIT-SLOT1 osaflogd[3546]: saImmOmSearchInitialize FAILED, rc = 13 Apr 13 13:07:24 SLES-64BIT-SLOT1 opensafd[3378]: ER Timed-out for response from LOGD Apr 13 13:07:24 SLES-64BIT-SLOT1 opensafd[3378]: ER Could Not RESPAWN LOGD Apr 13 13:07:24 SLES-64BIT-SLOT1 opensafd[3378]: ER Apr 13 13:07:24 SLES-64BIT-SLOT1 opensafd[3378]: ER FAILED TO RESPAWN Apr 13 13:07:24 SLES-64BIT-SLOT1 osaffmd[3419]: exiting for shutdown Apr 13 13:07:24 SLES-64BIT-SLOT1 osafimmd[3429]: exiting for shutdown Apr 13 13:07:24 SLES-64BIT-SLOT1 osafimmnd[3439]: NO No IMMD service = cluster restart, exiting Apr 13 13:07:24 SLES-64BIT-SLOT1 osafimmpbed: WA PBE lost contact with parent IMMND - Exiting Apr 13 13:07:24 SLES-64BIT-SLOT1 osafrded[3410]: exiting for shutdown Apr 13 13:07:24 SLES-64BIT-SLOT1 kernel: [ 1630.782513] TIPC: Disabling bearer eth:eth0 Apr 13 13:07:24 SLES-64BIT-SLOT1 kernel: [ 1630.782518] TIPC: Lost
[tickets] [opensaf:tickets] Re: #246 cpsv: Section create fails with random return values when mulitple processes try to create sections in the same checkpoint 70 node setup.
It makes absolutely no sense to have a defect ticket on a future release. /AndersBj From: A V Mahesh (AVM) [mailto:avmah...@users.sf.net] Sent: den 6 augusti 2015 06:22 To: [opensaf:tickets] Subject: [opensaf:tickets] #246 cpsv: Section create fails with random return values when mulitple processes try to create sections in the same checkpoint 70 node setup. * Type: enhancement -- defect * Milestone: future -- 4.7-Tentative * Comment: Need to reproduce on current staging , if issue exist need to be fixed in 4.7 release [tickets:#246]http://sourceforge.net/p/opensaf/tickets/246/ cpsv: Section create fails with random return values when mulitple processes try to create sections in the same checkpoint 70 node setup. Status: assigned Milestone: 4.7-Tentative Created: Thu May 16, 2013 06:37 AM UTC by A V Mahesh (AVM) Last Updated: Wed Jul 15, 2015 02:46 PM UTC Owner: A V Mahesh (AVM) from http://devel.opensaf.org/ticket/2386 Changeset: 3065 Setup: 70 node SLES11 VM setup 2 applications per node are running on a 70 node setup. Collocated checkpoint is created. After active replica is set from one process, section create with section id as GENERATED_SECTION_ID is invoked from rest of the processes. But the section create fails with ERR_EXIST, ERR_TIMEOUT, ERR_TRY_AGAIN. /var/log/messages for the two controllers will be shared. Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/246/ To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/ --- ** [tickets:#246] cpsv: Section create fails with random return values when mulitple processes try to create sections in the same checkpoint 70 node setup. ** **Status:** assigned **Milestone:** 4.7-Tentative **Created:** Thu May 16, 2013 06:37 AM UTC by A V Mahesh (AVM) **Last Updated:** Thu Aug 06, 2015 04:21 AM UTC **Owner:** A V Mahesh (AVM) from http://devel.opensaf.org/ticket/2386 Changeset: 3065 Setup: 70 node SLES11 VM setup 2 applications per node are running on a 70 node setup. Collocated checkpoint is created. After active replica is set from one process, section create with section id as GENERATED_SECTION_ID is invoked from rest of the processes. But the section create fails with ERR_EXIST, ERR_TIMEOUT, ERR_TRY_AGAIN. /var/log/messages for the two controllers will be shared. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to http://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at http://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] Re: #246 cpsv: Section create fails with random return values when mulitple processes try to create sections in the same checkpoint 70 node setup.
What I mean is: This is a n old ticket reporting a problem observed on an old release (4.2 according to the ticket). But you are declaring that the problem will only be fixed for a future release ! If that is the intention then this *is* by definition an enhancement and not a defect. /AndersBj From: Anders Bjornerstedt [mailto:ander...@users.sf.net] Sent: den 10 augusti 2015 09:02 To: [opensaf:tickets] Subject: [opensaf:tickets] Re: #246 cpsv: Section create fails with random return values when mulitple processes try to create sections in the same checkpoint 70 node setup. It makes absolutely no sense to have a defect ticket on a future release. /AndersBj From: A V Mahesh (AVM) [mailto:avmah...@users.sf.net] Sent: den 6 augusti 2015 06:22 To: [opensaf:tickets] Subject: [opensaf:tickets] #246 cpsv: Section create fails with random return values when mulitple processes try to create sections in the same checkpoint 70 node setup. * Type: enhancement -- defect * Milestone: future -- 4.7-Tentative * Comment: Need to reproduce on current staging , if issue exist need to be fixed in 4.7 release [tickets:#246]http://sourceforge.net/p/opensaf/tickets/246/http://sourceforge.net/p/opensaf/tickets/246/ cpsv: Section create fails with random return values when mulitple processes try to create sections in the same checkpoint 70 node setup. Status: assigned Milestone: 4.7-Tentative Created: Thu May 16, 2013 06:37 AM UTC by A V Mahesh (AVM) Last Updated: Wed Jul 15, 2015 02:46 PM UTC Owner: A V Mahesh (AVM) from http://devel.opensaf.org/ticket/2386 Changeset: 3065 Setup: 70 node SLES11 VM setup 2 applications per node are running on a 70 node setup. Collocated checkpoint is created. After active replica is set from one process, section create with section id as GENERATED_SECTION_ID is invoked from rest of the processes. But the section create fails with ERR_EXIST, ERR_TIMEOUT, ERR_TRY_AGAIN. /var/log/messages for the two controllers will be shared. Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/246/ To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/ [tickets:#246]http://sourceforge.net/p/opensaf/tickets/246/ cpsv: Section create fails with random return values when mulitple processes try to create sections in the same checkpoint 70 node setup. Status: assigned Milestone: 4.7-Tentative Created: Thu May 16, 2013 06:37 AM UTC by A V Mahesh (AVM) Last Updated: Thu Aug 06, 2015 04:21 AM UTC Owner: A V Mahesh (AVM) from http://devel.opensaf.org/ticket/2386 Changeset: 3065 Setup: 70 node SLES11 VM setup 2 applications per node are running on a 70 node setup. Collocated checkpoint is created. After active replica is set from one process, section create with section id as GENERATED_SECTION_ID is invoked from rest of the processes. But the section create fails with ERR_EXIST, ERR_TIMEOUT, ERR_TRY_AGAIN. /var/log/messages for the two controllers will be shared. Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/246/ To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/ --- ** [tickets:#246] cpsv: Section create fails with random return values when mulitple processes try to create sections in the same checkpoint 70 node setup. ** **Status:** assigned **Milestone:** 4.7-Tentative **Created:** Thu May 16, 2013 06:37 AM UTC by A V Mahesh (AVM) **Last Updated:** Thu Aug 06, 2015 04:21 AM UTC **Owner:** A V Mahesh (AVM) from http://devel.opensaf.org/ticket/2386 Changeset: 3065 Setup: 70 node SLES11 VM setup 2 applications per node are running on a 70 node setup. Collocated checkpoint is created. After active replica is set from one process, section create with section id as GENERATED_SECTION_ID is invoked from rest of the processes. But the section create fails with ERR_EXIST, ERR_TIMEOUT, ERR_TRY_AGAIN. /var/log/messages for the two controllers will be shared. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to http://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at http://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1436 MDS (TCP transport) fragment gets dropped, not received on standby node
- **Milestone**: 4.7-Tentative -- 4.6.1 --- ** [tickets:#1436] MDS (TCP transport) fragment gets dropped, not received on standby node** **Status:** unassigned **Milestone:** 4.6.1 **Created:** Thu Aug 06, 2015 06:47 AM UTC by Girish **Last Updated:** Mon Aug 10, 2015 04:25 AM UTC **Owner:** nobody **Attachments:** - [cpsv_test_app.c](https://sourceforge.net/p/opensaf/tickets/1436/attachment/cpsv_test_app.c) (8.5 kB; text/x-csrc) Opensaf version: 4.6 Linux: Standard Fedora 22 release, no additional patches required default wmem_max/rmem_max values default buffer sizes for MDS_SOCK_SND_RCV_BUF_SIZE and DTM_SOCK_SND_RCV_BUF_SIZE Active-standby model opensaf run as root user/group Steps: 1. start opensaf on node1 (active) and node2 (standby) 2. start ckpt_demo (modified application attached) on active node, ./ckpt_demo 1 3. wait till all the data is checkpointed 4. start ckpt_demo on standby node, ./ckpt_demo 0 Notice Error messages in mds.log: MDTM: Some stale message recd, hence dropping adest= My investigation is that one of the fragment is lost, active node sends - where as standby by node does not receive. mds log on standby: May 29 4:30:03.089974 8461 ERR| mdtm_process_poll_recv_data_tcp May 29 4:30:03.089995 8461 ERR|before mds_mdtm_process_recvdata fun-call 1, recd_bytes=1454, buff_toal_len=1454 May 29 4:30:03.090014 8461 ERR|MDTM: Recd message with Fragment Seqnum=18, frag_num=3049, from src_Tipc_id=0x0002020f:25826, pkt_type=35817 May 29 4:30:03.090032 8461 ERR|MDTM: Reassembling in FULL UB May 29 4:30:03.090174 8461 ERR|mdtm_process_recv_events_tcp: pollres=1 May 29 4:30:03.090198 8461 ERR|mdtm_process_recv_events_tcp: pfd[0].revents=1 May 29 4:30:03.090216 8461 ERR| mdtm_process_poll_recv_data_tcp May 29 4:30:03.090238 8461 ERR|before mds_mdtm_process_recvdata fun-call 1, recd_bytes=1454, buff_toal_len=1454 May 29 4:30:03.090257 8461 ERR|MDTM: Recd message with Fragment Seqnum=18, frag_num=3050, from src_Tipc_id=0x0002020f:25826, pkt_type=35818 May 29 4:30:03.090275 8461 ERR|MDTM: Reassembling in FULL UB May 29 4:30:03.090735 8461 ERR|mdtm_process_recv_events_tcp: pollres=1 May 29 4:30:03.090762 8461 ERR|mdtm_process_recv_events_tcp: pfd[0].revents=1 May 29 4:30:03.090780 8461 ERR| mdtm_process_poll_recv_data_tcp May 29 4:30:03.090801 8461 ERR|before mds_mdtm_process_recvdata fun-call 1, recd_bytes=1454, buff_toal_len=1454 May 29 4:30:03.090820 8461 ERR|MDTM: Recd message with Fragment Seqnum=18, frag_num=3051, from src_Tipc_id=0x0002020f:25826, pkt_type=35819 May 29 4:30:03.090838 8461 ERR|MDTM: Reassembling in FULL UB May 29 4:30:03.090978 8461 ERR|mdtm_process_recv_events_tcp: pollres=1 May 29 4:30:03.091028 8461 ERR|mdtm_process_recv_events_tcp: pfd[0].revents=1 May 29 4:30:03.091047 8461 ERR| mdtm_process_poll_recv_data_tcp May 29 4:30:03.091068 8461 ERR|before mds_mdtm_process_recvdata fun-call 1, recd_bytes=1454, buff_toal_len=1454 May 29 4:30:03.091087 8461 ERR|MDTM: Recd message with Fragment Seqnum=18, frag_num=3053, from src_Tipc_id=0x0002020f:25826, pkt_type=35821 May 29 4:30:03.091106 8461 ERR|MDTM: ERROR Frag recd is not next frag so dropping adest=0x0002020f64e2 May 29 4:30:03.091125 8461 ERR|mdtm_process_recv_events_tcp: pollres=1 May 29 4:30:03.091143 8461 ERR|mdtm_process_recv_events_tcp: pfd[0].revents=1 May 29 4:30:03.091160 8461 ERR| mdtm_process_poll_recv_data_tcp May 29 4:30:03.091180 8461 ERR|before mds_mdtm_process_recvdata fun-call 1, recd_bytes=1454, buff_toal_len=1454 May 29 4:30:03.091198 8461 ERR|MDTM: Recd message with Fragment Seqnum=18, frag_num=3054, from src_Tipc_id=0x0002020f:25826, pkt_type=35822 May 29 4:30:03.091216 8461 ERR|MDTM: Message is dropped as msg is out of seq TRANSPOR-ID=0x0002020f64e2 May 29 4:30:03.091235 8461 ERR|mdtm_process_recv_events_tcp: pollres=1 May 29 4:30:03.091283 8461 ERR|mdtm_process_recv_events_tcp: pfd[0].revents=1 May 29 4:30:03.091302 8461 ERR| mdtm_process_poll_recv_data_tcp mds log on active: May 29 4:29:36.021518 25826 ERR|before mds_mdtm_process_recvdata fun-call 1, recd_bytes=1454, buff_toal_len=1454 May 29 4:29:36.021537 25826 ERR|MDTM: Recd message with Fragment Seqnum=5, frag_num=3049, from src_Tipc_id=0x0002020f:25995, pkt_type=35817 May 29 4:29:36.021554 25826 ERR|MDTM: Reassembling in flat UB May 29 4:29:36.021702 25995 ERR|successfully sent message, send_len=1456 May 29 4:29:36.021729 25995 ERR|MDTM:2 Sending message with Service Seqno=4, Fragment Seqnum=5, frag_num=35818, TO Dest_Tipc_id=0x0002020f:25826 May 29 4:29:36.021778 25826 ERR|mdtm_process_recv_events_tcp: pollres=1 May 29 4:29:36.021800 25826 ERR|mdtm_process_recv_events_tcp: pfd[0].revents=1 May 29 4:29:36.021817 25826 ERR| mdtm_process_poll_recv_data_tcp May 29
[tickets] [opensaf:tickets] #1417 pyosaf: Add flag to Ccb class to let the user control the SA_IMM_CCB_REGISTERED flag
- **status**: review -- fixed - **Comment**: changeset: 6681:87d18f870326 tag: tip user:Johan Mårtensson johan.o.martens...@ericsson.com date:Thu Jul 16 14:25:08 2015 +0200 summary: pyosaf: (updated) Add parameter to Ccb constructor to set exact CCB flags [#1417] --- ** [tickets:#1417] pyosaf: Add flag to Ccb class to let the user control the SA_IMM_CCB_REGISTERED flag** **Status:** fixed **Milestone:** 4.7-Tentative **Created:** Tue Jul 14, 2015 07:00 AM UTC by Johan Mårtensson **Last Updated:** Wed Jul 15, 2015 12:43 PM UTC **Owner:** Johan Mårtensson The Ccb class in pyosaf.utils.immom.ccb gives a very convenient way for user code to make changes to IMM via CCBs. It unconditionally sets the SA_IMM_CCB_REGISTERED_FLAG which means that any code that requires the flag to be unset must use the low-level interface instead. The Ccb class should be enhanced to allow turning SA_IMM_CCB_REGISTERED off. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1425 IMM: Add attribute def flag SA_IMM_ATTR_STRONG_DEFAULT
--- ** [tickets:#1425] IMM: Add attribute def flag SA_IMM_ATTR_STRONG_DEFAULT** **Status:** unassigned **Milestone:** future **Created:** Fri Jul 24, 2015 12:49 PM UTC by Anders Bjornerstedt **Last Updated:** Fri Jul 24, 2015 12:49 PM UTC **Owner:** nobody The saImmOmClassCreate_2() API allows the user to provide a list of attribute definitions. An attribute definition may include a default value. The default value will be assigned to this attribute in an instance being created by the saImmOmCcbObjectCreate_2() or the saImmOiRtObjectCreate_2() APIs, if the user does not provide a value for that attribute. But a user/OI may later update such an object/attribute assigning the empty value to the attribute. So the default value mechanism is only effective for object creation and not later in the life cycle of the object. This makes the default attribute value mechanism weaker than some users would like. This enhancement proposes a new attribute flag SA_IMM_ATTR_STRONG_DEFAULT. This flag will only be allowed to be set on an attribute definition that includes a default value. The meaning of the flag is that if a user attempts an update of an object/attribute that assigns the empty value to such an attribute, then the IMM will replace, i.e. override, that value with the default value defined in the class. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1424 Agent crash leaving zombie OI
- **status**: unassigned -- assigned - **assigned_to**: Hung Nguyen --- ** [tickets:#1424] Agent crash leaving zombie OI** **Status:** assigned **Milestone:** 4.5.2 **Created:** Wed Jul 22, 2015 10:34 AM UTC by Hung Nguyen **Last Updated:** Wed Jul 22, 2015 10:34 AM UTC **Owner:** Hung Nguyen IMCN set IMMA_SYNCR_TIMEOUT to 1 sec. For some reasons osafntfimcnd got timeout when invoking saImmOiImplementerSet(). Then it exited. Jun 25 16:24:49 SC-1 osafntfimcnd[14926]: ER ntfimcn_imm_init Becoming an applier failed SA_AIS_ERR_TIMEOUT (5) Jun 25 16:24:49 SC-1 osafntfimcnd[14926]: ER ntfimcn_imm_init() Fail IMMND got IMMA_DOWN event before receiving IMMND_EVT_D2ND_IMPLSET_RSP from IMMD. IMMND tried to discard the implementer of the client but there's nothing to discard at that moment. Later, IMMND received IMMND_EVT_D2ND_IMPLSET_RSP and the implementer was added to immModel. Jun 25 16:24:50 SC-1 osafimmnd[14887]: NO Implementer (applier) connected: 6 (@OpenSafImmReplicatorB) 15, 2010f Jun 25 16:24:50 SC-1 osafimmnd[14887]: WA IMMND - Client went down so no response So when IMMND use immnd_client_node_get() to get the client node of the implementer, it will return null and fail to assert. In this case, that happened in immnd_evt_proc_object_modify(). Jun 25 16:25:04 SC-1 osafimmnd[14887]: immnd_evt.c:6242: immnd_evt_proc_object_modify: Assertion 'oi_cl_node != NULL' failed. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #171 saAmfComponentUnregister should be unexposed to handle obtained from B 4 1 version
- **Type**: defect -- enhancement --- ** [tickets:#171] saAmfComponentUnregister should be unexposed to handle obtained from B 4 1 version** **Status:** unassigned **Milestone:** future **Created:** Tue May 14, 2013 06:01 AM UTC by Nagendra Kumar **Last Updated:** Tue May 14, 2013 06:01 AM UTC **Owner:** nobody Migrated from http://devel.opensaf.org/ticket/2019 saAmfComponentUnregister api should return SA_AIS_ERR_VERSION when called with handle obtained from B 4 1 version. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #167 AMF: CSI descriptor for standby assignment contains wrong info
- **Type**: defect -- enhancement --- ** [tickets:#167] AMF: CSI descriptor for standby assignment contains wrong info** **Status:** unassigned **Milestone:** future **Created:** Tue May 14, 2013 04:50 AM UTC by Nagendra Kumar **Last Updated:** Tue May 14, 2013 04:50 AM UTC **Owner:** nobody Migrated from http://devel.opensaf.org/ticket/1790 Testing with the config as specified in 3.6.4. After having started a 3 node cluster the CSI set callbacks for standby assignments in the demo comp gets standbyRank always 0, activeCompName is not correct. Dispatched healthCheck 2 in 'safComp=AmfDemo?,safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo?' Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI1,safApp=AmfDemo?' HAState: Active CSIFlags: Add One Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI4,safApp=AmfDemo?' HAState: Active CSIFlags: Add One Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI5,safApp=AmfDemo?' HAState: Active CSIFlags: Add One Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI2,safApp=AmfDemo?' HAState: Standby CSIFlags: Add One, standbyRank=0, activeCompName=safComp=AmfDemo?,safSu=SU3,safSg=AmfDemo?,safApp=AmfDemo? Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI3,safApp=AmfDemo?' HAState: Standby CSIFlags: Add One, standbyRank=0, activeCompName=safComp=AmfDemo?,safSu=SU3,safSg=AmfDemo?,safApp=AmfDemo? Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI6,safApp=AmfDemo?' HAState: Standby CSIFlags: Add One, standbyRank=0, activeCompName=safComp=AmfDemo?,safSu=SU2,safSg=AmfDemo?,safApp=AmfDemo? Dispatched healthCheck 1 in 'safComp=AmfDemo?,safSu=SU2,safSg=AmfDemo?,safApp=AmfDemo?' Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI2,safApp=AmfDemo?' HAState: Active CSIFlags: Add One Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI6,safApp=AmfDemo?' HAState: Active CSIFlags: Add One Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI1,safApp=AmfDemo?' HAState: Standby CSIFlags: Add One, standbyRank=0, activeCompName=safComp=AmfDemo?,safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo? Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI3,safApp=AmfDemo?' HAState: Standby CSIFlags: Add One, standbyRank=0, activeCompName=safComp=AmfDemo?,safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo? Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI4,safApp=AmfDemo?' HAState: Standby CSIFlags: Add One, standbyRank=0, activeCompName=safComp=AmfDemo?,safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo? Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI5,safApp=AmfDemo?' HAState: Standby CSIFlags: Add One, standbyRank=0, activeCompName=safComp=AmfDemo?,safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo? Dispatched healthCheck 1 in 'safComp=AmfDemo?,safSu=SU3,safSg=AmfDemo?,safApp=AmfDemo?' Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI3,safApp=AmfDemo?' HAState: Active CSIFlags: Add One Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI1,safApp=AmfDemo?' HAState: Standby CSIFlags: Add One, standbyRank=0, activeCompName=safComp=AmfDemo?,safSu=SU2,safSg=AmfDemo?,safApp=AmfDemo? Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI2,safApp=AmfDemo?' HAState: Standby CSIFlags: Add One, standbyRank=0, activeCompName=safComp=AmfDemo?,safSu=SU2,safSg=AmfDemo?,safApp=AmfDemo? Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI4,safApp=AmfDemo?' HAState: Standby CSIFlags: Add One, standbyRank=0, activeCompName=safComp=AmfDemo?,safSu=SU2,safSg=AmfDemo?,safApp=AmfDemo? Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI5,safApp=AmfDemo?' HAState: Standby CSIFlags: Add One, standbyRank=0, activeCompName=safComp=AmfDemo?,safSu=SU2,safSg=AmfDemo?,safApp=AmfDemo? Dispatched 'CSI Set' CSIName: 'safCsi=AmfDemo?,safSi=SI6,safApp=AmfDemo?' HAState: Standby CSIFlags: Add One, standbyRank=0, activeCompName=safComp=AmfDemo?,safSu=SU1,safSg=AmfDemo?,safApp=AmfDemo? --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #168 saAmfComponentErrorReport() - errorDetectionTime not initialized by library
- **Type**: defect -- enhancement --- ** [tickets:#168] saAmfComponentErrorReport() - errorDetectionTime not initialized by library** **Status:** unassigned **Milestone:** future **Created:** Tue May 14, 2013 05:37 AM UTC by Nagendra Kumar **Last Updated:** Tue May 14, 2013 05:37 AM UTC **Owner:** nobody Migrated from http://devel.opensaf.org/ticket/1944 The AMF spec states that the library should initialize the absolute time when an error was reported. Today this is done by the amfnd and not the library. This potentially causes the error reporting time to get incorrect since it will depend on amfnd load. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #178 escalation policy is not happening till the restart count exceeds, instead of reaching saAmfSGCompRestartMax for NPI components
- **Type**: defect -- enhancement --- ** [tickets:#178] escalation policy is not happening till the restart count exceeds, instead of reaching saAmfSGCompRestartMax for NPI components** **Status:** unassigned **Milestone:** future **Created:** Tue May 14, 2013 06:24 AM UTC by Nagendra Kumar **Last Updated:** Tue May 14, 2013 06:24 AM UTC **Owner:** nobody Migrated from http://devel.opensaf.org/ticket/2144 error escalation is not happening till the restart count exceeds saAmfSGCompRestartMax for the components brought up in NPI. But according to spec, first level escalation should happen when the restart count reaches the saAmfSGCompRestartMax Mentioned in the spec, 3.11.2.2 page NO: 203, If this count reaches the saAmfSGCompRestartMax value before the end of the component restart probation period, the Availability Management Framework per- forms the first level of recovery escalation for that service unit: the Availability Man- agement Framework restarts the entire service unit --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #183 component's operational state and SU's presence state are not updated in the multiple NPI components instantiation failure
- **Type**: defect -- enhancement --- ** [tickets:#183] component's operational state and SU's presence state are not updated in the multiple NPI components instantiation failure** **Status:** unassigned **Milestone:** future **Created:** Tue May 14, 2013 07:00 AM UTC by Nagendra Kumar **Last Updated:** Tue May 14, 2013 07:00 AM UTC **Owner:** nobody Migrated from http://devel.opensaf.org/ticket/2178 1.Unlocked the NPI SU having 4 components. 2. First two components got instantiated properly, third component got instantiation failed. 3. AMF tried to cleanup third component, which got failed and third component moved to instantiation-failure. 4. Now amf should termiante the instantiated compnents, where the presence state for SU and component should be set to TERMINATING, but presence state is not updated for SU in the current implementation. 5.AMF tried to terminate the first and second components, which got failed and cleanup also failed. Hence termination failure for first and second components. It's OK and according to section 3.2 in page 62 But according to spec section 4.8, as presence state is set to termination-failed.component's operational state should be set to DISABLED , which is not set in the current implementation. Finally SU's presence state should be set to termination-failure. which is not updated in the current implementation --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #181 aAmfPmStart and Stop APIs with pmErrors = SA_AMF_PM_ABNORMAL_END gives ERR_INVALID_PARAM instead of OK.
- **Type**: defect -- enhancement --- ** [tickets:#181] aAmfPmStart and Stop APIs with pmErrors = SA_AMF_PM_ABNORMAL_END gives ERR_INVALID_PARAM instead of OK.** **Status:** assigned **Milestone:** future **Created:** Tue May 14, 2013 06:54 AM UTC by Nagendra Kumar **Last Updated:** Tue Aug 27, 2013 06:54 AM UTC **Owner:** Nagendra Kumar Migrated from http://devel.opensaf.org/ticket/2147 when passing the parameter as SA_AMF_PM_ABNORMAL_END gives ERR_INVALID_PARAM instead of OK. The values as per spec #define SA_AMF_PM_ABNORMAL_END 0x4. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #185 AMF: unnecessary/wrong updates of pure runtime attributes
- **Type**: defect -- enhancement --- ** [tickets:#185] AMF: unnecessary/wrong updates of pure runtime attributes** **Status:** accepted **Milestone:** future **Created:** Tue May 14, 2013 07:20 AM UTC by Nagendra Kumar **Last Updated:** Mon Jun 02, 2014 05:39 AM UTC **Owner:** Gary Lee Migrated from http://devel.opensaf.org/ticket/2227 saAmfCompRestartCount saAmfCompCurrProxyName function avd_data_update_req_evh() in avd_ndproc.c pure runtime attributes should only be updated by the callback. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #188 Use of pkill terminates the CLC-SCRIPT with a signal making amf think the component termination failed.
- **Type**: defect -- enhancement --- ** [tickets:#188] Use of pkill terminates the CLC-SCRIPT with a signal making amf think the component termination failed.** **Status:** unassigned **Milestone:** future **Created:** Tue May 14, 2013 07:41 AM UTC by Nagendra Kumar **Last Updated:** Tue May 14, 2013 07:41 AM UTC **Owner:** nobody Migrated from http://devel.opensaf.org/ticket/2330 Scenario: — In a restartable component make sure the component returns a error in any of the call backs. Amf will try terminating the component. In the termination script, if kill -9 pid of comp is used, the termination is being successful. Instead if pkill is used, the script is exiting with signal and amf is making the SU go in Termination-failed state. Snippet from /var/log/messages: Cleanup of 'safComp=pxyXAppSiorder1,safSu=SU_pxyXAppSiorder1,safSg=SG_pxyXAppSiorder,safApp=pxyXAppSiorderApp' failed Nov 21 19:38:17 SLES11-SLOT-2 osafamfnd[5944]: Reason:'Exec of script success, but script exits due to a signal' Nov 21 19:38:17 SLES11-SLOT-2 osafamfnd[5944]: Signal: 15, CLC CLI script:'/home/surender/amf/term_proxy.sh' Nov 21 19:38:17 SLES11-SLOT-2 osafamfnd[5944]: 'safSu=SU_pxyXAppSiorder1,safSg=SG_pxyXAppSiorder,safApp=pxyXAppSiorderApp' Presence State TERMINATING = TERMINATION_FAILED Note : The component has not registered any signal or handlers. changeset :3047 Changed 18 months ago by hafe ¶ Wouldn't that be the case if the script and the program binary has the same name? in reply to: ↑ 1 Changed 18 months ago by surenderk ¶ Replying to hafe: Wouldn't that be the case if the script and the program binary has the same name? The program binary and the script has different name. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #174 admin unlock operation on SU in shutting down state should be succeded
- **Type**: defect -- enhancement --- ** [tickets:#174] admin unlock operation on SU in shutting down state should be succeded** **Status:** unassigned **Milestone:** future **Created:** Tue May 14, 2013 06:16 AM UTC by Nagendra Kumar **Last Updated:** Tue May 14, 2013 06:16 AM UTC **Owner:** nobody Migrated from http://devel.opensaf.org/ticket/2063 Perform shutdown operation on SU, which is already brought up in 2N model having active HA assignment. Just respond to the csi set quiescing callback using saAmfResponse_4. With out calling the saAmfQuiescingComplete api, perform the unlock operation on the SU which is in shutting down state. Unlock operation should succeed according to spec ( section 9.4.2 , page NO : 370 ) The invocation of this administrative operation transitions the administrative state of the logical entity designated by the name to which objectName points to unlocked, provided that the logical entity was previously in the locked or shutting-down adminis- trative state. Now when unlock operation is issued on the SU in shutting down state, it gives ERR_TRY_AGAIN return value. Sep 16 17:45:23 SLES11-SLOT-1 osafamfd[24860]: Admin operation is already going Also, following needs to be considered while the SU is in shutting down state, 1) saAmfSUReadinessState should be transitioned to shutting down and then to out of service, when quiescing operation is completed. Now, saAmfSUReadinessState is set to out of service, whenever shutdown operation is performed. ( page no : 99 ) 2) saAmfSISUHAState should be set to quiescing or quiesced accordingly while shutdown operation is under progress. Now,saAmfSISUHAState is set to active, till shutdown operation is completed. ( page no : 99 ) 3) Whenver any other admin operation is performed on the SU in shutting down state, it should return BAD_OPERATION or corresponding. Currently, it gives TRY_AGAIN, which can be given in any context. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #173 protection group callback is giving null info when component moves from quiesced to no redundancy
- **Type**: defect -- enhancement --- ** [tickets:#173] protection group callback is giving null info when component moves from quiesced to no redundancy** **Status:** review **Milestone:** 4.5.2 **Created:** Tue May 14, 2013 06:13 AM UTC by Nagendra Kumar **Last Updated:** Wed Jul 08, 2015 11:48 AM UTC **Owner:** Nagendra Kumar Migrated from http://devel.opensaf.org/ticket/2034 Brought up 2 SU's in 2N model. Protection group tracking is started with SA_TRACK_CHANGES. When standby SU lock followed by active SU lock, three protection group callbacks are expected, 1 ) For standby SU lock, numberOfmembers - 1 notificatioBuffer-numberOfItems-2 standbyComponent's info filled with SA_AMF_PROTECTION_GROUP_REMOVED change and with haState as zero 2) for active SU lock, active to quiesced numberOfmembers - 1 notificatioBuffer-numberOfItems-1 quiesced component's info filled with SA_AMF_PROTECTION_GROUP_STATE_CHANGE and with haState as QUIESCED 3) For quiesced to no redundancy change, numberOfmembers - 0 notificatioBuffer-numberOfItems-1 old quiesced's info filled with SA_AMF_PROTECTION_GROUP_REMOVED change and with haState as zero In the current implementation, the callback info in the third step is not filled with old quiesced's info. numberOfItems is given as zero. Changed 20 months ago by erannjn ¶ Can you please explain step 3, I don't really understand quiesced = no redundancy. Are you doing an amf-adm lock on SI? Changed 20 months ago by erannjn ¶ Not able to reproduce this. Please let me know what I am missing/not understand. Running demo app, 2N, SA_TRACK_CHANGES. Startup: amf_demo[436]: amf_protection_group_callback(): amf_demo[436]: CSI='safCsi=AmfDemo,safSi=AmfDemo,safApp=AmfDemo1', members=1, items=1 amf_demo[436]: -- amf_demo[436]: item=0, change=2, comp=safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1, haState=1, rank=1 amf_demo[436]: -- amf_demo[436]: amf_protection_group_callback(): amf_demo[436]: CSI='safCsi=AmfDemo,safSi=AmfDemo,safApp=AmfDemo1', members=2, items=2 amf_demo[436]: -- amf_demo[436]: item=0, change=2, comp=safComp=AmfDemo,safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1, haState=2, rank=2 amf_demo[436]: item=1, change=1, comp=safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1, haState=1, rank=1 amf_demo[436]: -- amf_demo[436]: Dispatched healthCheck 3 in 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' amf_demo[436]: Dispatched healthCheck 4 in 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' lock standby SU2: amf_demo[436]: Dispatched healthCheck 5 in 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' amf_demo[436]: Dispatched healthCheck 6 in 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' amf_demo[436]: amf_protection_group_callback(): amf_demo[436]: CSI='safCsi=AmfDemo,safSi=AmfDemo,safApp=AmfDemo1', members=1, items=2 amf_demo[436]: -- amf_demo[436]: item=0, change=1, comp=safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1, haState=1, rank=1 amf_demo[436]: item=1, change=3, comp=safComp=AmfDemo,safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1, haState=0, rank=2 amf_demo[436]: -- amf_demo[436]: Dispatched healthCheck 7 in 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' amf_demo[436]: Dispatched healthCheck 8 in 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Lock active SU1: amf_demo[436]: Dispatched healthCheck 9 in 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' amf_demo[436]: Dispatched healthCheck 10 in 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' amf_demo[436]: Dispatched 'CSI Set' in 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' CSIName: '' HAState: Quiesced CSIFlags: Target All amf_demo[436]: amf_protection_group_callback(): amf_demo[436]: CSI='safCsi=AmfDemo,safSi=AmfDemo,safApp=AmfDemo1', members=1, items=1 amf_demo[436]: -- amf_demo[436]: item=0, change=4, comp=safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1, haState=3, rank=1 amf_demo[436]: -- amf_demo[436]: Dispatched 'CSI Remove' in 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' CSI: '' CSIFlags: Target All amf_demo[436]: amf_protection_group_callback(): amf_demo[436]:
[tickets] [opensaf:tickets] #177 N+M redandancy model was coming up with assignments even if component capability used is SA_AMF_COMP_ONE_ACTIVE_OR_ONE_STANDBY.
- **Type**: defect -- enhancement --- ** [tickets:#177] N+M redandancy model was coming up with assignments even if component capability used is SA_AMF_COMP_ONE_ACTIVE_OR_ONE_STANDBY.** **Status:** unassigned **Milestone:** future **Created:** Tue May 14, 2013 06:21 AM UTC by Nagendra Kumar **Last Updated:** Tue May 14, 2013 06:21 AM UTC **Owner:** nobody Migrated from http://devel.opensaf.org/ticket/2119 As per AMF B04.01 spec, section 3.6.3.1 page 132 Components implementing any of the capability models described in Section 3.5 on page 107, except the 1_active _or_1_standby capability model, can participate in the N+M redundancy model. However I could able to bring up the N+M model with capability as 4 and also observed the SUSI and CSI HA assignments. Configuring N+M model with capability as 4 should be detected during configuration itself and should be rejected. # immlist safSupportedCsType=safVersion=4.0.0\,safCSType=safCsi_NpM1,safVersion=4.0.0,safCompType=Comp_NpMApp_npm_1_1 Name Type Value(s) safSupportedCsType SA_NAME_T safSupportedCsType=safVersion=4.0.0\,safCSType=safCsi_NpM1 (58) saAmfCtDefNumMaxStandbyCSIs SA_UINT32_T 1 (0x1) saAmfCtDefNumMaxActiveCSIs SA_UINT32_T 1 (0x1) saAmfCtCompCapability SA_UINT32_T 4 (0x4) SaImmAttrImplementerName? SA_STRING_T safAmfService SaImmAttrClassName? SA_STRING_T SaAmfCtCsType? SaImmAttrAdminOwnerName? SA_STRING_T Empty linux-xc76:/opt/goahead/tetware/opensaffire/suites/avsv/regress/imm_auto # amf-state siass ha safSISU=safSu=SC-1\,safSg=NoRed?\,safApp=OpenSAF,safSi=NoRed?2,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) safSISU=safSu=SC-2\,safSg=NoRed?\,safApp=OpenSAF,safSi=NoRed?1,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) safSISU=safSu=SC-2\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF saAmfSISUHAState=STANDBY(2) safSISU=safSu=PL-3\,safSg=NoRed?\,safApp=OpenSAF,safSi=NoRed?3,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) safSISU=safSu=d_NplusM_1Norm_2\,safSg=SG_d_npm\,safApp=NpMApp,safSi=d_NplusM_1Norm_3,safApp=NpMApp saAmfSISUHAState=ACTIVE(1) safSISU=safSu=d_NplusM_1Norm_1\,safSg=SG_d_npm\,safApp=NpMApp,safSi=d_NplusM_1Norm_1,safApp=NpMApp saAmfSISUHAState=ACTIVE(1) safSISU=safSu=d_NplusM_1Norm_1\,safSg=SG_d_npm\,safApp=NpMApp,safSi=d_NplusM_1Norm_2,safApp=NpMApp saAmfSISUHAState=ACTIVE(1) safSISU=safSu=d_NplusM_1Norm_5\,safSg=SG_d_npm\,safApp=NpMApp,safSi=d_NplusM_1Norm_5,safApp=NpMApp saAmfSISUHAState=STANDBY(2) safSISU=safSu=d_NplusM_1Norm_2\,safSg=SG_d_npm\,safApp=NpMApp,safSi=d_NplusM_1Norm_4,safApp=NpMApp saAmfSISUHAState=ACTIVE(1) safSISU=safSu=d_NplusM_1Norm_3\,safSg=SG_d_npm\,safApp=NpMApp,safSi=d_NplusM_1Norm_5,safApp=NpMApp saAmfSISUHAState=ACTIVE(1) safSISU=safSu=d_NplusM_1Norm_3\,safSg=SG_d_npm\,safApp=NpMApp,safSi=d_NplusM_1Norm_6,safApp=NpMApp saAmfSISUHAState=ACTIVE(1) safSISU=safSu=d_NplusM_1Norm_4\,safSg=SG_d_npm\,safApp=NpMApp,safSi=d_NplusM_1Norm_1,safApp=NpMApp saAmfSISUHAState=STANDBY(2) safSISU=safSu=d_NplusM_1Norm_4\,safSg=SG_d_npm\,safApp=NpMApp,safSi=d_NplusM_1Norm_2,safApp=NpMApp saAmfSISUHAState=STANDBY(2) safSISU=safSu=d_NplusM_1Norm_4\,safSg=SG_d_npm\,safApp=NpMApp,safSi=d_NplusM_1Norm_3,safApp=NpMApp saAmfSISUHAState=STANDBY(2) safSISU=safSu=d_NplusM_1Norm_4\,safSg=SG_d_npm\,safApp=NpMApp,safSi=d_NplusM_1Norm_4,safApp=NpMApp saAmfSISUHAState=STANDBY(2) safSISU=safSu=d_NplusM_1Norm_5\,safSg=SG_d_npm\,safApp=NpMApp,safSi=d_NplusM_1Norm_6,safApp=NpMApp saAmfSISUHAState=STANDBY(2) linux-xc76:/opt/goahead/tetware/opensaffire/suites/avsv/regress/imm_auto # --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #172 AVSv:Loops in the csi dependencies should be checked during config validatation and rejetcted
- **Type**: defect -- enhancement --- ** [tickets:#172] AVSv:Loops in the csi dependencies should be checked during config validatation and rejetcted** **Status:** review **Milestone:** 4.5.2 **Created:** Tue May 14, 2013 06:10 AM UTC by Nagendra Kumar **Last Updated:** Fri Jul 10, 2015 04:46 AM UTC **Owner:** Praveen Migrated from http://devel.opensaf.org/ticket/2025 At present looping is detected at the time of addition of csi dependencies and assert is being triggered if detected. Loops should be checked during configuration validation and rejected --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #163 AMF: Auto-adjust for standby assignments in NWay red. model does not work
- **Type**: defect -- enhancement --- ** [tickets:#163] AMF: Auto-adjust for standby assignments in NWay red. model does not work** **Status:** unassigned **Milestone:** future **Created:** Tue May 14, 2013 04:39 AM UTC by Nagendra Kumar **Last Updated:** Tue May 14, 2013 04:39 AM UTC **Owner:** nobody Migrated from http://devel.opensaf.org/ticket/1763 Tested using the UML env and samples/avsv/campaigns/NwayInstallationCampaign.xml adjusted saAmfSGMaxActiveSIsperSU, saAmfSGMaxStandbySIsperSU, saAmfSIPrefStandbyAssignments to one. safSi=Nway-0,safApp=AmfDemoNway? saAmfSIAssignmentState=PARTIALLY_ASSIGNED(3) safSi=Nway-1,safApp=AmfDemoNway? saAmfSIAssignmentState=FULLY_ASSIGNED(2) safSi=Nway-2,safApp=AmfDemoNway? saAmfSIAssignmentState=FULLY_ASSIGNED(2) safSISU=safSu=AmfDemoNway?-0\,safSg=SGNway\,safApp=AmfDemoNway?,safSi=Nway-1,safApp=AmfDemoNway? saAmfSISUHAState=STANDBY(2) safSISU=safSu=AmfDemoNway?-0\,safSg=SGNway\,safApp=AmfDemoNway?,safSi=Nway-2,safApp=AmfDemoNway? saAmfSISUHAState=ACTIVE(1) safSISU=safSu=AmfDemoNway?-1\,safSg=SGNway\,safApp=AmfDemoNway?,safSi=Nway-1,safApp=AmfDemoNway? saAmfSISUHAState=ACTIVE(1) safSISU=safSu=AmfDemoNway?-1\,safSg=SGNway\,safApp=AmfDemoNway?,safSi=Nway-2,safApp=AmfDemoNway? saAmfSISUHAState=STANDBY(2) safSISU=safSu=AmfDemoNway?-2\,safSg=SGNway\,safApp=AmfDemoNway?,safSi=Nway-0,safApp=AmfDemoNway? saAmfSISUHAState=ACTIVE(1) safSi=Nway-0,safApp=AmfDemoNway? has no standby assignment. Related to http://devel.opensaf.org/ticket/1746 (see discussion) --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #164 AMF does not validate existence in model of SU for SaAmfSIRankedSU objects
- **Type**: defect -- enhancement --- ** [tickets:#164] AMF does not validate existence in model of SU for SaAmfSIRankedSU objects** **Status:** review **Milestone:** 4.5.2 **Created:** Tue May 14, 2013 04:41 AM UTC by Nagendra Kumar **Last Updated:** Wed Jul 01, 2015 07:22 AM UTC **Owner:** Nagendra Kumar Migrated from http://devel.opensaf.org/ticket/1785 If the SU in SaAmfSIRankedSU does not exist, no error is reported by AMF and it actually start using the config. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #166 AMF: creating an SaAmfSIRankedSU object with rank==0 causes inconsistence
- **Type**: defect -- enhancement --- ** [tickets:#166] AMF: creating an SaAmfSIRankedSU object with rank==0 causes inconsistence** **Status:** unassigned **Milestone:** future **Created:** Tue May 14, 2013 04:48 AM UTC by Nagendra Kumar **Last Updated:** Tue May 14, 2013 04:48 AM UTC **Owner:** nobody Migrated from http://devel.opensaf.org/ticket/1789 An SaAmfSIRankedSU object with rank 0 can be created but not changed or deleted. The way AMF implements its SaAmfSIRankedSU DB is suspect and should be redesigned to store SaAmfSIRankedSU objects in a DN indexed represented data structure instead of the key SI-rank. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1417 pyosaf: Add flag to Ccb class to let the user control the SA_IMM_CCB_REGISTERED flag
- **status**: assigned -- review --- ** [tickets:#1417] pyosaf: Add flag to Ccb class to let the user control the SA_IMM_CCB_REGISTERED flag** **Status:** review **Milestone:** 4.6.1 **Created:** Tue Jul 14, 2015 07:00 AM UTC by Johan Mårtensson **Last Updated:** Tue Jul 14, 2015 07:00 AM UTC **Owner:** Johan Mårtensson The Ccb class in pyosaf.utils.immom.ccb gives a very convenient way for user code to make changes to IMM via CCBs. It unconditionally sets the SA_IMM_CCB_REGISTERED_FLAG which means that any code that requires the flag to be unset must use the low-level interface instead. The Ccb class should be enhanced to allow turning SA_IMM_CCB_REGISTERED off. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1418 pyosaf: Add convenience method to clear admin owner role on a set of objects
- **status**: assigned -- review --- ** [tickets:#1418] pyosaf: Add convenience method to clear admin owner role on a set of objects** **Status:** review **Milestone:** 4.7-Tentative **Created:** Tue Jul 14, 2015 08:03 AM UTC by Johan Mårtensson **Last Updated:** Tue Jul 14, 2015 08:38 AM UTC **Owner:** Johan Mårtensson The Ccb class is very convenient for performing IMM changes but it does not clear the admin role after apply. This should be added to avoid having the user code fall back to low-level C marshalling to clean up after the CCB. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1417 pyosaf: Add flag to Ccb class to let the user control the SA_IMM_CCB_REGISTERED flag
- **Milestone**: 4.6.1 -- 4.7-Tentative --- ** [tickets:#1417] pyosaf: Add flag to Ccb class to let the user control the SA_IMM_CCB_REGISTERED flag** **Status:** review **Milestone:** 4.7-Tentative **Created:** Tue Jul 14, 2015 07:00 AM UTC by Johan Mårtensson **Last Updated:** Wed Jul 15, 2015 12:01 PM UTC **Owner:** Johan Mårtensson The Ccb class in pyosaf.utils.immom.ccb gives a very convenient way for user code to make changes to IMM via CCBs. It unconditionally sets the SA_IMM_CCB_REGISTERED_FLAG which means that any code that requires the flag to be unset must use the low-level interface instead. The Ccb class should be enhanced to allow turning SA_IMM_CCB_REGISTERED off. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1414 amfd: N+M wrong assigment with SI dependency and role failure
- **Milestone**: future -- 4.5.2 --- ** [tickets:#1414] amfd: N+M wrong assigment with SI dependency and role failure** **Status:** review **Milestone:** 4.5.2 **Created:** Mon Jul 13, 2015 05:32 PM UTC by Alex Jones **Last Updated:** Wed Jul 15, 2015 09:04 AM UTC **Owner:** Alex Jones Given the following setup: 6 nodes: 1) 2N SG on nodes 1 and 2 2) N+1 SG on all nodes with SI dependencies for all its SIs with above 2N SI. 3) controllers on nodes 1 and 2 (also hosting payload SGs from 1 and 2) 4) Node 1 has active controller, active 2N assignment, and active N+1 assignment 5) Node 2 has standby controller, standby 2N assignment, and 5 N+M standby assignments If I hard reset node 1, its active N+1 SI gets assigned to another SU that already has an active assignment, which is illegal. And when node 1 comes back up, it gets no standby N+1 assignments. It should get all the standby assignments for the N+1 SG. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1410 pyosaf: Invalid exception used in ImmObject (object.py)
- **Milestone**: 4.6.1 -- 4.5.2 --- ** [tickets:#1410] pyosaf: Invalid exception used in ImmObject (object.py)** **Status:** unassigned **Milestone:** 4.5.2 **Created:** Fri Jul 10, 2015 10:11 AM UTC by Johan Mårtensson **Last Updated:** Fri Jul 10, 2015 10:11 AM UTC **Owner:** nobody ImmObject uses an invalid way to raise exceptions: a = ImmObject('NonExistingClass') Traceback (most recent call last): File stdin, line 1, in module File /usr/local/lib/python2.7/dist-packages/pyosaf/utils/immom/object.py, line 63, in __init__ raise TypeError: exceptions must be old-style classes or derived from BaseException, not NoneType --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1410 pyosaf: Invalid exception used in ImmObject (object.py)
- **Version**: 4.5.2 -- 4.5 --- ** [tickets:#1410] pyosaf: Invalid exception used in ImmObject (object.py)** **Status:** unassigned **Milestone:** 4.5.2 **Created:** Fri Jul 10, 2015 10:11 AM UTC by Johan Mårtensson **Last Updated:** Wed Jul 15, 2015 12:46 PM UTC **Owner:** nobody ImmObject uses an invalid way to raise exceptions: a = ImmObject('NonExistingClass') Traceback (most recent call last): File stdin, line 1, in module File /usr/local/lib/python2.7/dist-packages/pyosaf/utils/immom/object.py, line 63, in __init__ raise TypeError: exceptions must be old-style classes or derived from BaseException, not NoneType --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1398 smf: Add capability to redo CCBs that fail
- **Milestone**: 4.7-Tentative -- future --- ** [tickets:#1398] smf: Add capability to redo CCBs that fail ** **Status:** unassigned **Milestone:** future **Created:** Wed Jul 01, 2015 02:07 PM UTC by Rafael **Last Updated:** Mon Jul 13, 2015 09:51 AM UTC **Owner:** nobody CCBs may fail for a variety of resource related reasons. SMF campaigns can be made more robust if they are capable of redoing/replaying a CCB that has been aborted. A CCB that is aborted due to validation error will not succeed when replayed, but no damage will be done either. A CCB that is aborted due to resource reasons may succeed when replayed, avoiding the abandonement of the whole campaign. During the final stages of an upgrade campaign PBE is enabled. PBE is not ready until it attaches, so CCB operations will get TRY_AGAIN in that window. Once the PBE has attached the IMM is persistent-write-available and CCB operations are allowed again. Any CCB started and adding operations *before* the PBE was enabled by a CCB, will be a doomed CCB. This since the CCBs generated operations before the PBE was enabled and thus before the PBE was even starting and thus the PBE will be unaware of these pre-PBE-enable operations. Such a CCB would fail on an op-count check in the CCB commit processing of that CCB in the PBE. In 4.7-tentative an enhancement #1261 was implemented in the IMM service to make this abort cleaner, i.e. to avoid the ugly op-count error in the PBE. The PBE generates an admin-operation to abort *all* open CCBs (all CCBs that are active but not critical), just before attaching. The problem was that the first implementation of #1261 resulted in the PBE often attaching as OI *before* the abort of non-critical CCBs had been processed. When the abort requested by the PBE was finally processed it aborted also innocent CCBs that had actually started *after* the PBE was attached as PBE-OI. The syndrome as such, i.e. attach of PBE causing the abort of a valid CCB, could still happen on earlier releases but was quite rare. The syslog would then show the op-count error reported by the PBE. A possible improvement in SMF is to read the runtime-attribute: opensafImmNostdFlags in the OpenSAF IMM object opensafImm=opensafImm,safApp=safImmService and check that it is not Empty which would mean that PBE is attached. But it is not really clear why this is needed in 4.7-tentative when it was not needed earlier. CCBs may actually get aborted due to resource error at any time and not only in conjunction with PBE enable. A general increase of the robustness of SMF campaigns could be achieved by adding logic for redoing CCBs that fail unexpectedly. If such a CCB was valid, i.e. it was aborted due to resource error and not validation error, then it has a high probability of succeeding when retried. IMM ticked related to this: #1261 Jun 29 10:36:35 SC-2-2 osafimmpbed: IN Admop for aborting CCBs result: 1, immsv returned 1 Jun 29 10:36:35 SC-2-2 osafimmpbed: NO Update epoch 63 committing with ccbId:10185/4294967685 Jun 29 10:36:36 SC-2-2 osafsmfd[4726]: NO CAMP: Start campaign complete actions (95) Jun 29 10:36:36 SC-2-2 osafimmnd[4476]: NO Create of PERSISTENT runtime object 'smfRollbackElement=CampComplete,safSmfCampaign=ERIC-CMWUpgrade,safApp=safSmfService' (safSmfCampaign). Jun 29 10:36:36 SC-2-2 osafimmnd[4476]: NO Ccb 305 COMMITTED (immcfg_SC-2-1_14718) Jun 29 10:36:36 SC-2-2 osafimmnd[4476]: NO Ccb 306 COMMITTED (immcfg_SC-2-1_14741) Jun 29 10:36:36 SC-2-2 osafimmnd[4476]: NO Ccb 307 COMMITTED (immcfg_SC-2-1_14764) Jun 29 10:36:36 SC-2-2 osafimmnd[4476]: NO Ccb 308 COMMITTED (immcfg_SC-2-1_14787) Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: NO Ccb 309 COMMITTED (immcfg_SC-2-1_14810) Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: NO Ccb 310 COMMITTED (immcfg_SC-2-1_14833) Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: NO Ccb 311 COMMITTED (immcfg_SC-2-1_14856) Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: NO Ccb 312 COMMITTED (immcfg_SC-2-1_14879) Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: NO Create of PERSISTENT runtime object 'smfRollbackElement=ccb_0002,smfRollbackElement=CampComplete,safSmfCampaign=ERIC-CMWUpgrade,safApp=safSmfService' (safSmfCampaign). Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: NO PBE-OI established on this SC. Dumping incrementally to file imm.db Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: NO CCB 313 aborted by: immadm -o 202 safRdn=immManagement,safApp=safImmService Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: WA Timeout while waiting for implementer, aborting ccb:313 Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: NO Ccb 313 ABORTED (SMFSERVICE) Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: WA s_info-to_svc == 0 reply context destroyed before this reply could be made Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: WA Failed to send response to agent/client over MDS Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: NO Ccb 313 not in correct state (12) for Apply ignoring request Jun 29 10:36:37 SC-2-2 osafimmnd[4476]: WA Spurious and
[tickets] [opensaf:tickets] #1417 pyosaf: Add flag to Ccb class to let the user control the SA_IMM_CCB_REGISTERED flag
- **Version**: 4.5.2 -- --- ** [tickets:#1417] pyosaf: Add flag to Ccb class to let the user control the SA_IMM_CCB_REGISTERED flag** **Status:** review **Milestone:** 4.7-Tentative **Created:** Tue Jul 14, 2015 07:00 AM UTC by Johan Mårtensson **Last Updated:** Wed Jul 15, 2015 12:42 PM UTC **Owner:** Johan Mårtensson The Ccb class in pyosaf.utils.immom.ccb gives a very convenient way for user code to make changes to IMM via CCBs. It unconditionally sets the SA_IMM_CCB_REGISTERED_FLAG which means that any code that requires the flag to be unset must use the low-level interface instead. The Ccb class should be enhanced to allow turning SA_IMM_CCB_REGISTERED off. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #144 LOG: LOG server hangs with huge log records
- **Type**: defect -- enhancement --- ** [tickets:#144] LOG: LOG server hangs with huge log records** **Status:** unassigned **Milestone:** future **Created:** Mon May 13, 2013 11:08 AM UTC by elunlen **Last Updated:** Mon May 13, 2013 11:48 AM UTC **Owner:** elunlen Playing around testing LOG limits I noticed that with a record size of 64kB the log server hangs/spins at 100% CPU. Found the cause which was a never ending loop in log_stream_write() due to variable truncation. In the call to lgs_format_log_record() the fixedLogRecordSize parameter is uint16 but needs to be uint32 to match the configured value. Migrated from devel.opensaf.org #2705 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #135 SI unassignment failed for SU on node lock
- **Type**: defect -- enhancement --- ** [tickets:#135] SI unassignment failed for SU on node lock** **Status:** unassigned **Milestone:** future **Created:** Mon May 13, 2013 10:23 AM UTC by surender khetavath **Last Updated:** Tue May 21, 2013 10:02 AM UTC **Owner:** nobody Changeset : 4241 with 27943117 patch Model : TwoN configuration: 1App,1SG,5SUs with 3comps each and 5SIs with 3CSIs each SU1 has only 2comps i.e Asymmetric configuration Transport : TCP/ipv6-linklocal PBE enabled. si-si dependency configured as : Si1-Si2-SI3-Si4 scenario: --- Initially SU2(active) is mapped to SC-2 and SU3(standby) mapped to PL-3. Lock the node SC-2. A component in SU2 is made to reject quiescing assignment. Escalation went till SuFailover, but assignments were not removed and SU2 held in terminating presence state. States after node lock --- safAmfNode=SC-2,safAmfCluster=myAmfCluster saAmfNodeAdminState=LOCKED(2) saAmfNodeOperState=ENABLED(1) safSi=TWONSI1,safApp=TWONAPP saAmfSIAdminState=UNLOCKED(1) saAmfSIAssignmentState=FULLY_ASSIGNED(2) safSi=TWONSI2,safApp=TWONAPP saAmfSIAdminState=UNLOCKED(1) saAmfSIAssignmentState=FULLY_ASSIGNED(2) safSi=TWONSI5,safApp=TWONAPP saAmfSIAdminState=UNLOCKED(1) saAmfSIAssignmentState=PARTIALLY_ASSIGNED(3) safSi=TWONSI3,safApp=TWONAPP saAmfSIAdminState=UNLOCKED(1) saAmfSIAssignmentState=PARTIALLY_ASSIGNED(3) safSi=TWONSI4,safApp=TWONAPP saAmfSIAdminState=UNLOCKED(1) saAmfSIAssignmentState=PARTIALLY_ASSIGNED(3) safSu=SU2,safSg=SGONE,safApp=TWONAPP saAmfSUAdminState=UNLOCKED(1) saAmfSUOperState=DISABLED(2) saAmfSUPresenceState=TERMINATING(4) saAmfSUReadinessState=OUT-OF-SERVICE(1) --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #123 Sample SMF RPM integration
- **Type**: defect -- enhancement --- ** [tickets:#123] Sample SMF RPM integration** **Status:** unassigned **Milestone:** future **Created:** Mon May 13, 2013 08:48 AM UTC by Ingvar Bergström **Last Updated:** Mon May 13, 2013 08:48 AM UTC **Owner:** nobody http://devel.opensaf.org/ticket/1905 In order to make SMF more usable, OpenSAF should contain a sample integration with RPM. Some ideas: - an SMF rpm repo, managed by some new scripts - importing an rpm will create a Bundle object in IMM with install/remove scripts setup properly - ETF.xml integrated in rpm metadata e.g. the description field of the header. - ETF.xml is needed since install scripts are needed for restartable resp. non-restartable components. - non AMF SW is out of scope - sample campaigns - sample use of deploying application specific IMM configuration Even better would be integration with yum or zypper. But one step at a time... Changed 2 years ago by hafe status changed from new to accepted Changed 16 months ago by hafe milestone changed from 4.2.0.GA to 4.3.GA Changed 2 months ago by hafe milestone changed from 4.3.GA to future_releases Changed 7 weeks ago by hafe owner changed from hafe to ingber status changed from accepted to assigned --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #6 Amfd crashed on active controller
- **Type**: defect -- enhancement --- ** [tickets:#6] Amfd crashed on active controller** **Status:** unassigned **Milestone:** future **Created:** Mon May 06, 2013 07:21 AM UTC by Nagendra Kumar **Last Updated:** Tue Mar 24, 2015 10:33 AM UTC **Owner:** nobody Migrated from http://devel.opensaf.org/ticket/3135 == Changeset : 4200 Transport : TCP/ipv6 ( link local ) patches : 2794 PBE enabled. Model : NWAY with Si Dep configured. configuration : 1SG,5SUs,3comps in each su, 5Sis with 3csi each. Si Dep : si1(Sponsor) - si2 - si3 - si4 SU1,SU2,SU3,SU4 are mapped to sc-1,sc-2,pl-3,pl-4 resp. su5 is also on pl-5. SC-1 was active and sc-2 standby. scenario: A campaign was modelled to add one more node pl-5 and SUEXP on PL-5. /var/log/messages on SC-1: Apr 25 12:38:08 OEL-64BIT-SLOT2 osafamfd[20569]: avd_su.c:1551: avd_su_dec_curr_stdby_si: Assertion 'su-saAmfSUNumCurrStandbySIs 0' failed. Apr 25 12:38:08 OEL-64BIT-SLOT2 osafamfnd[20586]: ER AMF director unexpectedly crashed Apr 25 12:38:08 OEL-64BIT-SLOT2 osafamfnd[20586]: Rebooting OpenSAF NodeId? = 131343 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received Apr 25 12:38:08 OEL-64BIT-SLOT2 opensaf_reboot: Rebooting local node Apr 25 12:38:08 OEL-64BIT-SLOT2 osafimmnd[20477]: NO Implementer locally disconnected. Marking it as doomed 4 22, 2010f (safAmfService) GDB output (gdb) bt #0 0x003c0be328a5 in raise () from /lib64/libc.so.6 #1 0x003c0be34085 in abort () from /lib64/libc.so.6 #2 0x003321a18fbb in osafassert_fail (file=0x4afe07 avd_su.c, line=1551, func=0x4b0cc0 avd_su_dec_curr_stdby_si, assertion=0x4b0c30 su-saAmfSUNumCurrStandbySIs 0) at sysf_def.c:301 #3 0x0048e90d in avd_su_dec_curr_stdby_si (su=0x1391120) at avd_su.c:1551 #4 0x00490311 in avd_susi_update_assignment_counters (susi=0x13e0ae0, action=AVSV_SUSI_ACT_MOD, current_ha_state=SA_AMF_HA_STANDBY, new_ha_state=SA_AMF_HA_ACTIVE) at avd_siass.c:697 #5 0x0048fffc in avd_susi_mod_send (susi=0x13e0ae0, ha_state=SA_AMF_HA_ACTIVE) at avd_siass.c:616 #6 0x00477c83 in avd_sg_nway_susi_succ_sg_realign (cb=0x6c2d20, su=0x137fc30, susi=0x134c8e0, act=AVSV_SUSI_ACT_DEL, state=SA_AMF_HA_QUIESCED) at avd_sgNWayfsm.c:2590 #7 0x00470aef in avd_sg_nway_susi_sucss_func (cb=0x6c2d20, su=0x137fc30, susi=0x134c8e0, act=AVSV_SUSI_ACT_DEL, state=SA_AMF_HA_QUIESCED) at avd_sgNWayfsm.c:337 #8 0x0047dcd8 in avd_su_si_assign_evh (cb=0x6c2d20, evt=0x7ffcc8001fd0) at avd_sgproc.c:859 #9 0x0043def2 in avd_process_event (cb_now=0x6c2d20, evt=0x7ffcc8001fd0) at avd_proc.c:591 #10 0x0043dc56 in avd_main_proc () at avd_proc.c:507 #11 0x00409c23 in main (argc=2, argv=0x7fff5035b6a8) at amfd_main.c:47 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets