Re: [devel] [PATCH 1 of 1] v4 amfnd: Not all npi-components are restarted during SuRestart escalation [#885]
Ack. Thanks -Nagu -Original Message- From: Minh Hon Chau [mailto:minh.c...@dektech.com.au] Sent: 15 May 2014 18:58 To: Nagendra Kumar; hans.nordeb...@ericsson.com; hans.fe...@ericsson.com; Praveen Malviya Cc: opensaf-devel@lists.sourceforge.net Subject: [PATCH 1 of 1] v4 amfnd: Not all npi-components are restarted during SuRestart escalation [#885] osaf/services/saf/amf/amfnd/susm.cc | 16 +--- 1 files changed, 13 insertions(+), 3 deletions(-) In case of npi su restart recovery, this problem appears because the condition of changing su presence state to INSTANTIATED is not sufficient. If the component of the last csi in csi_list has been restarted first due to componentRestart (before suRestart), amfnd can not find any next csi and changes the su presence state to INSTANTIATED, this will miss the restart for the rest of components. The fix uses UNASSIGNED csi to avoid duplication of restarting the same component many times during su restart, only component linked to UNASSIGNED csi is restarted. Therefore, the condition of changing su presence state to INSTANTIATED should be all csis are ASSIGNED diff --git a/osaf/services/saf/amf/amfnd/susm.cc b/osaf/services/saf/amf/amfnd/susm.cc --- a/osaf/services/saf/amf/amfnd/susm.cc +++ b/osaf/services/saf/amf/amfnd/susm.cc @@ -2638,12 +2638,22 @@ uint32_t avnd_su_pres_restart_compinst_h /* get the next csi */ curr_csi = (AVND_COMP_CSI_REC *)m_NCS_DBLIST_FIND_NEXT(curr_csi-si_dll_node); - if (curr_csi) { + /* To be taken into restart, the next found csi must be UNASSIGNED (avoid + * duplication of restarting component). The component linked to csi must + * not be in RESTARTING (avoid the component just coming in component restart + * recovery), and not be in INSTANTIATING (avoid the component in progress + * of instantiation). But for now the check for INSTANTIATING is not included + * because avnd_su_pres_insting_surestart_hdler has not been implemented. + */ + if (curr_csi != NULL + m_AVND_COMP_CSI_CURR_ASSIGN_STATE_IS_UNASSIGNED(curr_csi ) == true + curr_csi-comp-pres != SA_AMF_PRESENCE_RESTARTING) { /* we have another csi. trigger the comp fsm with RestartEv */ - rc = avnd_comp_clc_fsm_trigger(cb, curr_csi-comp, AVND_COMP_CLC_PRES_FSM_EV_RESTART); + rc = avnd_comp_clc_fsm_trigger(cb, curr_csi-comp, + AVND_COMP_CLC_PRES_FSM_EV_RESTART); if (NCSCC_RC_SUCCESS != rc) goto done; - } else { + } else if (all_csis_in_assigned_state(su)) { /* = si assignment done */ avnd_su_pres_state_set(su, SA_AMF_PRESENCE_INSTANTIATED); } -- Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available Simple to use. Nothing to install. Get started now for free. http://p.sf.net/sfu/SauceLabs ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1 of 1] amfd: allow change to saAmfSGNumPrefActiveSUs for N+M SG while UNLOCKED [#871]
Since patch is retaining the older values in case of delete, so that should be ok in case sg is unlock. A notice/warning log is required to be kept to inform user about the same. Thanks -Nagu -Original Message- From: Hans Feldt [mailto:hans.fe...@ericsson.com] Sent: 15 May 2014 17:45 To: Nagendra Kumar; Alex Jones; Praveen Malviya Cc: opensaf-devel@lists.sourceforge.net Subject: RE: [PATCH 1 of 1] amfd: allow change to saAmfSGNumPrefActiveSUs for N+M SG while UNLOCKED [#871] Yeah OK but the default is one(1). Should it be reverted to that or it so that we can't allow to decrease this value? /Hans -Original Message- From: Nagendra Kumar [mailto:nagendr...@oracle.com] Sent: den 15 maj 2014 14:00 To: Hans Feldt; Alex Jones; Praveen Malviya Cc: opensaf-devel@lists.sourceforge.net Subject: RE: [PATCH 1 of 1] amfd: allow change to saAmfSGNumPrefActiveSUs for N+M SG while UNLOCKED [#871] There is no value corresponding to it in sgtype. -Original Message- From: Hans Feldt [mailto:hans.fe...@ericsson.com] Sent: 15 May 2014 17:22 To: Nagendra Kumar; Alex Jones; Praveen Malviya Cc: opensaf-devel@lists.sourceforge.net Subject: RE: [PATCH 1 of 1] amfd: allow change to saAmfSGNumPrefActiveSUs for N+M SG while UNLOCKED [#871] if (value_is_deleted) sg-saAmfSGNumPrefActiveSUs = sg-saAmfSGNumPrefActiveSUs; shouldn't it reset the value from the SgType? Thanks, Hans -Original Message- From: Nagendra Kumar [mailto:nagendr...@oracle.com] Sent: den 15 maj 2014 13:56 To: Alex Jones; Hans Feldt; Praveen Malviya Cc: opensaf-devel@lists.sourceforge.net Subject: RE: [PATCH 1 of 1] amfd: allow change to saAmfSGNumPrefActiveSUs for N+M SG while UNLOCKED [#871] Ok, keep a notice log. Ack. Thanks -Nagu -Original Message- From: Alex Jones [mailto:ajo...@genband.com] Sent: 13 May 2014 20:29 To: Nagendra Kumar; hans.fe...@ericsson.com; Praveen Malviya Cc: opensaf-devel@lists.sourceforge.net Subject: Re: [PATCH 1 of 1] amfd: allow change to saAmfSGNumPrefActiveSUs for N+M SG while UNLOCKED [#871] Oh. That is really to disallow deleting the attribute. Do I even need it? Alex On 05/13/2014 02:15 AM, Nagendra Kumar wrote: Hi Alex, I couldn't get the below change: + if (value_is_deleted) + sg- saAmfSGNumPrefActiveSUs = sg- saAmfSGNumPrefActiveSUs; + else Thanks -Nagu -Original Message- From: Alex Jones [mailto:ajo...@genband.com] Sent: 13 May 2014 00:36 To: hans.fe...@ericsson.com; Nagendra Kumar; Praveen Malviya Cc: opensaf-devel@lists.sourceforge.net Subject: [PATCH 1 of 1] amfd: allow change to saAmfSGNumPrefActiveSUs for N+M SG while UNLOCKED [#871] osaf/services/saf/amf/amfd/sg.cc | 39 +++ 1 files changed, 39 insertions(+), 0 deletions(-) ccb_completed_modify_hdlr: Attribute 'saAmfSGNumPrefActiveSUs' cannot be modified when SG is unlocked OpenSAF disallows changing saAmfSGNumPrefActiveSUs while the SG is UNLOCKED. This prevents in-service capacity upgrade for N+M models, which is not desirable. This change allows saAmfSGNumPrefActiveSUs to be increased for N+M models, as long as there are instantiated spare SUs equal N+to the amount of the increase. These instantiated spare SUs will then be assigned ACTIVE, and their corresponding CSIs will be assigned to the STANDBY. diff --git a/osaf/services/saf/amf/amfd/sg.cc b/osaf/services/saf/amf/amfd/sg.cc --- a/osaf/services/saf/amf/amfd/sg.cc +++ b/osaf/services/saf/amf/amfd/sg.cc @@ -675,6 +675,33 @@ static SaAisErrorT ccb_completed_modify_ rc = SA_AIS_ERR_BAD_OPERATION; goto done; } + } else if (!strcmp(attribute-attrName, saAmfSGNumPrefActiveSUs)) { + uint32_t pref_active_su = *static_castSaUint32T *(value); + + if (sg-sg_redundancy_model != SA_AMF_NPM_REDUNDANCY_MODEL) { + report_ccb_validation_error(opdata, + %s: saAmfSGNumPrefActiveSUs for non-N+M model cannot + be modified when SG is unlocked, __FUNCTION__); + rc = SA_AIS_ERR_BAD_OPERATION; + goto done; + } else if (pref_active_su sg-
Re: [devel] [PATCH 1 of 1] amfd: Remove asserts from validation routines [#849]
Hi Nagu I decided to be less restrictive and ignore the unknown attributes, in case the AMF model is extended in the future. Eg. an attribute may be added that does not conflict with the current version, so an application designer may deploy the same configuration on multiple versions of OpenSAF. But perhaps that is not a valid use case. Thanks Gary On 15/05/14 22:10, Nagendra Kumar wrote: Some comments inlined with Nagu Thanks -Nagu -Original Message- From: Gary Lee [mailto:gary@dektech.com.au] Sent: 08 May 2014 12:14 To: hans.fe...@ericsson.com; hans.nordeb...@ericsson.com; Nagendra Kumar; Praveen Malviya Cc: opensaf-devel@lists.sourceforge.net Subject: [PATCH 1 of 1] amfd: Remove asserts from validation routines [#849] osaf/services/saf/amf/amfd/app.cc| 18 +++--- osaf/services/saf/amf/amfd/comp.cc | 5 +++-- osaf/services/saf/amf/amfd/compcstype.cc | 5 +++-- osaf/services/saf/amf/amfd/hlt.cc| 10 ++ osaf/services/saf/amf/amfd/sg.cc | 9 + osaf/services/saf/amf/amfd/si.cc | 5 +++-- osaf/services/saf/amf/amfd/su.cc | 5 +++-- 7 files changed, 34 insertions(+), 23 deletions(-) When an unknown attribute in encountered in various callbacks, sometimes an assert is called. In other cases, the operation is rejected. This may create forward compatibility issues if new attributes are added in the future. In this patch, the asserts are replaced with a warning and the corresponding attribute change will be permitted but ignored. diff --git a/osaf/services/saf/amf/amfd/app.cc b/osaf/services/saf/amf/amfd/app.cc --- a/osaf/services/saf/amf/amfd/app.cc +++ b/osaf/services/saf/amf/amfd/app.cc @@ -243,12 +243,15 @@ static SaAisErrorT app_ccb_completed_cb( SaNameT dn = *((SaNameT*)attribute- attrValues[0]); if (NULL == avd_apptype_get(dn)) { report_ccb_validation_error(opdata, saAmfAppType '%s' not found, dn.value); +rc = SA_AIS_ERR_BAD_OPERATION; goto done; } rc = SA_AIS_OK; break; -} else -osafassert(0); +} else { +LOG_WA(Ignoring unknown attribute '%s', attribute-attrName); +rc = SA_AIS_OK; +} [Nagu]:: Returning ok for bad attribute may not be appropriate.BAD_OP can be returned. } break; case CCBUTIL_DELETE: @@ -294,10 +297,10 @@ static void app_ccb_apply_cb(CcbUtilOper app-saAmfAppType = *((SaNameT*)attribute-attrValues[0]); app-app_type = avd_apptype_get(app- saAmfAppType); avd_apptype_add_app(app); -break; } -else -osafassert(0); +else { +TRACE(Ignoring unknown attribute '%s', attribute-attrName); +} [Nagu]: If above comments are accepted, then this check is not required. } break; } @@ -396,8 +399,9 @@ static SaAisErrorT app_rt_attr_cb(SaImmO if (!strcmp(attributeName, saAmfApplicationCurrNumSGs)) { avd_saImmOiRtObjectUpdate_sync(objectName, attributeName, SA_IMM_ATTR_SAUINT32T, app- saAmfApplicationCurrNumSGs); -} else -osafassert(0); +} else { +LOG_WA(Ignoring unknown attribute '%s', attributeName); +} } return SA_AIS_OK; diff --git a/osaf/services/saf/amf/amfd/comp.cc b/osaf/services/saf/amf/amfd/comp.cc --- a/osaf/services/saf/amf/amfd/comp.cc +++ b/osaf/services/saf/amf/amfd/comp.cc @@ -830,8 +830,9 @@ static SaAisErrorT comp_rt_attr_cb(SaImm /* TODO */ } else if (!strcmp(saAmfCompCurrProxiedNames, attributeName)) { /* TODO */ -} else -osafassert(0); +} else { +LOG_WA(Ignoring unknown attribute '%s', attributeName); +} } return SA_AIS_OK; diff --git a/osaf/services/saf/amf/amfd/compcstype.cc b/osaf/services/saf/amf/amfd/compcstype.cc --- a/osaf/services/saf/amf/amfd/compcstype.cc +++ b/osaf/services/saf/amf/amfd/compcstype.cc @@ -435,8 +435,9 @@ static SaAisErrorT compcstype_rt_attr_ca SA_IMM_ATTR_SAUINT32T, cst- saAmfCompNumCurrStandbyCSIs); } else if (!strcmp(saAmfCompAssignedCsi, attributeName)) { /* TODO */ -
Re: [devel] [PATCH 1 of 1] amfd: Remove asserts from validation routines [#849]
Hi Gary, I understand that. But from user perspective, if Amf returns OK for not supported attributes, then user will feel that the operation has been successful, but actually operation was not done. So, my suggestion is return BAD operation for those unsupported attributes. This will go inline with Amf PR document. Thanks -Nagu -Original Message- From: Gary Lee [mailto:gary@dektech.com.au] Sent: 16 May 2014 11:48 To: Nagendra Kumar; hans.fe...@ericsson.com; hans.nordeb...@ericsson.com; Praveen Malviya Cc: opensaf-devel@lists.sourceforge.net Subject: Re: [PATCH 1 of 1] amfd: Remove asserts from validation routines [#849] Hi Nagu I decided to be less restrictive and ignore the unknown attributes, in case the AMF model is extended in the future. Eg. an attribute may be added that does not conflict with the current version, so an application designer may deploy the same configuration on multiple versions of OpenSAF. But perhaps that is not a valid use case. Thanks Gary On 15/05/14 22:10, Nagendra Kumar wrote: Some comments inlined with Nagu Thanks -Nagu -Original Message- From: Gary Lee [mailto:gary@dektech.com.au] Sent: 08 May 2014 12:14 To: hans.fe...@ericsson.com; hans.nordeb...@ericsson.com; Nagendra Kumar; Praveen Malviya Cc: opensaf-devel@lists.sourceforge.net Subject: [PATCH 1 of 1] amfd: Remove asserts from validation routines [#849] osaf/services/saf/amf/amfd/app.cc| 18 +++--- osaf/services/saf/amf/amfd/comp.cc | 5 +++-- osaf/services/saf/amf/amfd/compcstype.cc | 5 +++-- osaf/services/saf/amf/amfd/hlt.cc| 10 ++ osaf/services/saf/amf/amfd/sg.cc | 9 + osaf/services/saf/amf/amfd/si.cc | 5 +++-- osaf/services/saf/amf/amfd/su.cc | 5 +++-- 7 files changed, 34 insertions(+), 23 deletions(-) When an unknown attribute in encountered in various callbacks, sometimes an assert is called. In other cases, the operation is rejected. This may create forward compatibility issues if new attributes are added in the future. In this patch, the asserts are replaced with a warning and the corresponding attribute change will be permitted but ignored. diff --git a/osaf/services/saf/amf/amfd/app.cc b/osaf/services/saf/amf/amfd/app.cc --- a/osaf/services/saf/amf/amfd/app.cc +++ b/osaf/services/saf/amf/amfd/app.cc @@ -243,12 +243,15 @@ static SaAisErrorT app_ccb_completed_cb( SaNameT dn = *((SaNameT*)attribute- attrValues[0]); if (NULL == avd_apptype_get(dn)) { report_ccb_validation_error(opdata, saAmfAppType '%s' not found, dn.value); + rc = SA_AIS_ERR_BAD_OPERATION; goto done; } rc = SA_AIS_OK; break; - } else - osafassert(0); + } else { + LOG_WA(Ignoring unknown attribute '%s', attribute-attrName); + rc = SA_AIS_OK; + } [Nagu]:: Returning ok for bad attribute may not be appropriate.BAD_OP can be returned. } break; case CCBUTIL_DELETE: @@ -294,10 +297,10 @@ static void app_ccb_apply_cb(CcbUtilOper app-saAmfAppType = *((SaNameT*)attribute-attrValues[0]); app-app_type = avd_apptype_get(app- saAmfAppType); avd_apptype_add_app(app); - break; } - else - osafassert(0); + else { + TRACE(Ignoring unknown attribute '%s', attribute-attrName); + } [Nagu]: If above comments are accepted, then this check is not required. } break; } @@ -396,8 +399,9 @@ static SaAisErrorT app_rt_attr_cb(SaImmO if (!strcmp(attributeName, saAmfApplicationCurrNumSGs)) { avd_saImmOiRtObjectUpdate_sync(objectName, attributeName, SA_IMM_ATTR_SAUINT32T, app- saAmfApplicationCurrNumSGs); - } else - osafassert(0); + } else { + LOG_WA(Ignoring unknown attribute '%s', attributeName); + } } return SA_AIS_OK; diff --git a/osaf/services/saf/amf/amfd/comp.cc b/osaf/services/saf/amf/amfd/comp.cc --- a/osaf/services/saf/amf/amfd/comp.cc +++ b/osaf/services/saf/amf/amfd/comp.cc @@ -830,8 +830,9 @@ static SaAisErrorT comp_rt_attr_cb(SaImm /* TODO */ } else if (!strcmp(saAmfCompCurrProxiedNames, attributeName)) {
Re: [devel] [PATCH 1 of 1] smfd: campaign can be committed after cluster reboot in state completed [#906]
Ack. / Anders Widell On 05/15/2014 03:16 PM, Ingvar Bergstrom wrote: osaf/services/saf/smfsv/smfd/SmfUpgradeProcedure.cc | 10 +- 1 files changed, 9 insertions(+), 1 deletions(-) Without this patch a cluster reboot in state execution completed will put the upgrade campaign in a fail state if the old unused versioned types are removed in the campaign wrapup campCompleteAction portion of the campaign. A campaign in fail state can not be committed. diff --git a/osaf/services/saf/smfsv/smfd/SmfUpgradeProcedure.cc b/osaf/services/saf/smfsv/smfd/SmfUpgradeProcedure.cc --- a/osaf/services/saf/smfsv/smfd/SmfUpgradeProcedure.cc +++ b/osaf/services/saf/smfsv/smfd/SmfUpgradeProcedure.cc @@ -1315,11 +1315,19 @@ SmfUpgradeProcedure::addStepModification std::multimapstd::string, objectInst i_objects) { //This method is called for each calculated step. The purpose is to find out and add the modifications - //which shold be carried out for this step. The targetEntityTemplate parent/type part of the procedure (in the campaign) + //which should be carried out for this step. The targetEntityTemplate parent/type part of the procedure (in the campaign) //is used to match the steps activation/deactivation units. //If a match is found the modifications associated with this parent/type shall be added to the step. TRACE_ENTER(); +//Skip this for procedures in state completed, modifications will not be needed if completed. +//This can happend if the cluster is rebooted and will fail if the reboot is performed when the +//versioned types are removed i.e. during test traffic, if the types was removed in campaign wrapup/complete section. +if (getState() == SA_SMF_PROC_COMPLETED) { +TRACE_LEAVE(); +return true; +} + std::list SmfTargetEntityTemplate * ::const_iterator it; //For each targetEntityTemplate in the procedure -- Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available Simple to use. Nothing to install. Get started now for free. http://p.sf.net/sfu/SauceLabs ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 3 of 4] mds: use TIPC segmentation/reassembly [#654]
Hi, This is a series of 4 patches. It is all or nothing. I sent out v2 for patch 1 after Surya’s comments. So is this an ack for the complete series? Thanks, Hans From: A V Mahesh [mailto:mahesh.va...@oracle.com] Sent: den 16 maj 2014 03:51 To: Hans Feldt; Hans Feldt Cc: opensaf-devel@lists.sourceforge.net; Suryanarayana Garlapati Subject: Re: [devel] [PATCH 3 of 4] mds: use TIPC segmentation/reassembly [#654] Importance: High Hi Hans , ACK from me for your published patch , please go-ahead and push the patch as it is. I discussion/syncdup with Surya, following are the summary points : By default MDS does FULL encode/decode between different nodes (inter-node communication), also the behavior is same when the MDS messaging happens between 32-bit and 64-bit processes. your published Patch: #654 is not impacting any of these default functionality . Even though MDS is providing a compile time flag called mds_arch, when this flag is set explicitly to the same value on different nodes, then only MDS messaging across nodes will happen with FLAT encode/decode routines, but most of the OpenSAF services (mds application) are still doing FULL encode/decode even when the MDS is triggered FLAT encode/decode callbacks ( Middle ware applications are not using advantage of mds_arch ). Till now none of the OpenSAF user are using the 'mds_arch' field and all are going with the MDS default settings. So they are NO impacts of deprecating mds_arch and considering that for versions. -AVM On 5/7/2014 12:20 PM, SuryaNarayana Garlapati wrote: OK, Here goes the solution. There is a 8 bit field in each message that is being exchanged between the services (which contains the message priority, mds prot_ver(mds protocol and mds version)) when message send is attempted. 6 bits are allocated for the mds prot_ver, present value for this field is 0xA8. This field can be used for versioning(and intended for this type of changes in MDS). This version can be dynamically learnt for a destination, when the first message is received from that destination and is stored in the Subscription result table. MDS prot_ver value will be changed in this release. So we can identify the old and new ones. If the destination is an old one, we will send with the old logic and if its a new one we will send with the bigger message. There is other solution, but this includes quite code changes. For example, In each node bind a new type say X and in each process subscribe for this X. When we get up this means, it indicates a new node and messaging is with bigger size else the normal one. Only a single extra bind is done for a node. There is one more solution, but it breaks the inservice upgradabilty, where in which the vdest id range is decreased. Regards Surya On Wednesday 07 May 2014 08:44 AM, A V Mahesh wrote: Surya, Thank for reiterating arch_word of the MDS feature ,we all in sync. On 5/6/2014 3:55 PM, SuryaNarayana Garlapati wrote: MDS version unless we get alternate bits/variables used for MDS version. [Surya] Thats the reason i am asking for some time. [AVM] If we get some alternate bits/variables used for MDS version this will the best Option and we can retain arch_word MDS feature. On 5/6/2014 3:55 PM, SuryaNarayana Garlapati wrote: 6) This current patch Honors the default Opensaf configuration ` mds_arch=0` behavior ( point 1 2 ) and it removed hidden `same arc message optimization` feature related code and used those 3bits/variable for the purpose of MDS version. [Surya] Current patch has still problems. I am repeating the same, but for your convenience, they are listed below. 1. Its taking out MDS optimization feature for the performance enhancement of the messaging. [AVM] we all in sync on this , now we are at the point whether we should expose this or not to OpenSAF users? 2. 64 bit and 32 bit combination will not work on the local node as it is doing flat encode which is wrong. [AVM] The current patch do ENC_TYPE_FULL for mixed 32/64 bit clients on the same Node. -AVM On 5/6/2014 3:55 PM, SuryaNarayana Garlapati wrote: Before going ahead, Following is the explanation for the arch_word of the MDS. Arch word(4bits) is combination of architecture and bit size of the machine. 3 bits are allocated for architecture and 1 bit is allocated for bit size. architecture of value 0 means unspecified. Message encoding is done as follows: 1. If the source and destination of the message is same process, then copy callback is called. 2. If source and destination arch_word are same, then encode flat is called. But there are some exceptions here, Say there are two nodes, A(Intel) and B(Powerpc) communicating with each other and are 64 bit each and opensaf code was compiled without giving any mds_arch input. In this case mds_arch is set to default 0. So as per the rule, encode flat callback should be called. But in this case if the flat encode is called,
Re: [devel] [PATCH 1 of 1] v4 amfnd: Not all npi-components are restarted during SuRestart escalation [#885]
Sorry for being picky but see inline ... Anyway ack from me, but this could be fixed (by the maintainer) before push. Thanks, Hans -Original Message- From: Minh Hon Chau [mailto:minh.c...@dektech.com.au] Sent: den 15 maj 2014 15:28 To: nagendr...@oracle.com; Hans Nordebäck; Hans Feldt; praveen.malv...@oracle.com Cc: opensaf-devel@lists.sourceforge.net Subject: [PATCH 1 of 1] v4 amfnd: Not all npi-components are restarted during SuRestart escalation [#885] [Hans] again the problem is repeated in the short commit message. This one should just say in forward terms what the patch is doing with the code base like: amfnd: fix SU restart escalation [#885] osaf/services/saf/amf/amfnd/susm.cc | 16 +--- 1 files changed, 13 insertions(+), 3 deletions(-) [Hans] problem/symptom should be here (which now is the short commit message) This is the analysis: In case of npi su restart recovery, this problem appears because the condition of changing su presence state to INSTANTIATED is not sufficient. If the component of the last csi in csi_list has been restarted first due to componentRestart (before suRestart), amfnd can not find any next csi and changes the su presence state to INSTANTIATED, this will miss the restart for the rest of components. The fix uses UNASSIGNED csi to avoid duplication of restarting the same component many times during su restart, only component linked to UNASSIGNED csi is restarted. Therefore, the condition of changing su presence state to INSTANTIATED should be all csis are ASSIGNED diff --git a/osaf/services/saf/amf/amfnd/susm.cc b/osaf/services/saf/amf/amfnd/susm.cc --- a/osaf/services/saf/amf/amfnd/susm.cc +++ b/osaf/services/saf/amf/amfnd/susm.cc @@ -2638,12 +2638,22 @@ uint32_t avnd_su_pres_restart_compinst_h /* get the next csi */ curr_csi = (AVND_COMP_CSI_REC *)m_NCS_DBLIST_FIND_NEXT(curr_csi-si_dll_node); - if (curr_csi) { + /* To be taken into restart, the next found csi must be UNASSIGNED (avoid + * duplication of restarting component). The component linked to csi must + * not be in RESTARTING (avoid the component just coming in component restart + * recovery), and not be in INSTANTIATING (avoid the component in progress + * of instantiation). But for now the check for INSTANTIATING is not included + * because avnd_su_pres_insting_surestart_hdler has not been implemented. [Hans] cryptic comment that sounds like a TODO? + */ + if (curr_csi != NULL + m_AVND_COMP_CSI_CURR_ASSIGN_STATE_IS_UNASSIGNED(curr_csi) == true + curr_csi-comp-pres != SA_AMF_PRESENCE_RESTARTING) { [Hans] some parenthesis around the expressions would be nice and is safer /* we have another csi. trigger the comp fsm with RestartEv */ - rc = avnd_comp_clc_fsm_trigger(cb, curr_csi-comp, AVND_COMP_CLC_PRES_FSM_EV_RESTART); + rc = avnd_comp_clc_fsm_trigger(cb, curr_csi-comp, + AVND_COMP_CLC_PRES_FSM_EV_RESTART); if (NCSCC_RC_SUCCESS != rc) goto done; - } else { + } else if (all_csis_in_assigned_state(su)) { [Hans] == true /* = si assignment done */ avnd_su_pres_state_set(su, SA_AMF_PRESENCE_INSTANTIATED); } -- Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available Simple to use. Nothing to install. Get started now for free. http://p.sf.net/sfu/SauceLabs ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 3 of 4] mds: use TIPC segmentation/reassembly [#654]
I am getting an IMM sync problem when testing with different controller versions. When second controller is part of IMM loading it works, when it syncs it does not work: May 16 12:01:01 SC-2 local0.notice osafimmnd[427]: Started May 16 12:01:01 SC-2 local0.notice osafimmnd[427]: NO SERVER STATE: IMM_SERVER_ANONYMOUS -- IMM_SERVER_CLUSTER_WAITING May 16 12:01:01 SC-2 local0.notice osafimmnd[427]: NO SERVER STATE: IMM_SERVER_CLUSTER_WAITING -- IMM_SERVER_LOADING_PENDING May 16 12:01:01 SC-2 local0.notice osafimmnd[427]: NO SERVER STATE: IMM_SERVER_LOADING_PENDING -- IMM_SERVER_SYNC_PENDING May 16 12:01:01 SC-2 local0.notice osafimmnd[427]: NO NODE STATE- IMM_NODE_ISOLATED May 16 12:01:02 SC-2 local0.notice osafimmd[389]: NO SBY: Ruling epoch noted as:4 May 16 12:01:02 SC-2 local0.notice osafimmd[389]: NO IMMND coord at 2010f May 16 12:01:02 SC-2 local0.notice osafimmnd[427]: NO NODE STATE- IMM_NODE_W_AVAILABLE May 16 12:01:02 SC-2 local0.notice osafimmnd[427]: NO SERVER STATE: IMM_SERVER_SYNC_PENDING -- IMM_SERVER_SYNC_CLIENT May 16 12:01:03 SC-2 local0.notice osafimmd[389]: NO SBY: New Epoch for IMMND process at node 2010f old epoch: 3 new epoch:4 May 16 12:01:03 SC-2 local0.notice osafimmd[389]: NO IMMND coord at 2010f May 16 12:01:03 SC-2 local0.err osafimmnd[427]: ER Can not sync Ccb that is active May 16 12:01:03 SC-2 local0.err opensafd[337]: ER Could Not RESPAWN IMMND and trace: May 16 12:01:20.395920 osafimmnd [455:ImmModel.cc:15360] finalizeSync May 16 12:01:20.396115 osafimmnd [455:ImmModel.cc:15643] IN finalizeSync message contains 0 admin-owners May 16 12:01:20.396218 osafimmnd [455:ImmModel.cc:15748] T5 Immnd sync client synced implementer id:6 name:safSmfService size:13 May 16 12:01:20.396522 osafimmnd [455:ImmModel.cc:15748] T5 Immnd sync client synced implementer id:5 name:safCheckPointService size:20 May 16 12:01:20.396615 osafimmnd [455:ImmModel.cc:15748] T5 Immnd sync client synced implementer id:0 name:@safAmfService2020f size:19 May 16 12:01:20.396706 osafimmnd [455:ImmModel.cc:15748] T5 Immnd sync client synced implementer id:3 name:safAmfService size:13 May 16 12:01:20.396798 osafimmnd [455:ImmModel.cc:15748] T5 Immnd sync client synced implementer id:2 name:safClmService size:13 May 16 12:01:20.396887 osafimmnd [455:ImmModel.cc:15748] T5 Immnd sync client synced implementer id:1 name:safLogService size:13 May 16 12:01:20.397029 osafimmnd [455:ImmModel.cc:15752] IN finalizeSync message contains 6 implementers May 16 12:01:20.397293 osafimmnd [455:ImmModel.cc:15828] IN Synced 58 classes May 16 12:01:20.397404 osafimmnd [455:ImmModel.cc:15859] T5 CCB 1 state OTHER May 16 12:01:20.397838 osafimmnd [455:ImmModel.cc:15861] ER Can not sync Ccb that is active May 16 12:01:20.397938 osafimmnd [455:ImmModel.cc:16173] finalizeSync May 16 12:01:20.406856 osafimmnd [455:immnd_evt.c:8607] ER Unexpected local error 21 in finalizeSync for sync client - aborting On 16 May 2014 10:20, A V Mahesh mahesh.va...@oracle.com wrote: Hi, So is this an ack for the complete series? Yes. -AVM On 5/16/2014 12:51 PM, Hans Feldt wrote: Hi, This is a series of 4 patches. It is all or nothing. I sent out v2 for patch 1 after Surya’s comments. So is this an ack for the complete series? Thanks, Hans *From:*A V Mahesh [mailto:mahesh.va...@oracle.com] *Sent:* den 16 maj 2014 03:51 *To:* Hans Feldt; Hans Feldt *Cc:* opensaf-devel@lists.sourceforge.net; Suryanarayana Garlapati *Subject:* Re: [devel] [PATCH 3 of 4] mds: use TIPC segmentation/reassembly [#654] *Importance:* High Hi Hans , *ACK from me for your published patch , please go-ahead and push the patch as it is.* I discussion/syncdup with Surya, following are the summary points : By *default *MDS does FULL encode/decode between different nodes (inter-node communication), also the behavior is same when the MDS messaging happens between 32-bit and 64-bit processes. your published Patch: #654 is not impacting any of these default functionality**. Even though MDS is providing a compile time flag called mds_arch, when this flag is set explicitly to the same value on different nodes, then only MDS messaging across nodes will happen with FLAT encode/decode routines, but most of the OpenSAF services (mds application) are still doing FULL encode/decode even when the MDS is triggered FLAT encode/decode callbacks ( Middle ware applications are not using advantage of mds_arch ). Till now none of the OpenSAF user are using the 'mds_arch' field and all are going with the MDS default settings. So they are NO impacts of deprecating mds_arch and considering that for versions. -AVM On 5/7/2014 12:20 PM, SuryaNarayana Garlapati wrote: OK, Here goes the solution. There is a 8 bit field in each message that is being exchanged between the services (which contains the message priority, mds prot_ver(mds protocol and mds version)) when message send is attempted. 6 bits are allocated for the