Hi Praveen, The component is not able to take assignments even after each cleanup and instantiation success phase, as certain important configurations were missing. To configure it properly, it requires time much higher then the setCSITimeout value. So it keeps rebooting and then enters into the loop of SU failure and node failure and rolling reboot of the VM. So basically it excepts that while these escalation is happening to fix the configurations. But, I wanted to stop the instantiation using some admin commands and then clean up and load fresh configuration to start the component again. This was not possible currently in any way through SU/Comp admin commands. I have tried lock-in, shutdown, repaired, restart etc. on the SU and comp but none of them seems to stop the escalation. Once this happens, stopping opensaf service also hangs, causing premature reboot. I thought admin commands will be able to override the basic admin commands.
Thanks Greg On Wed, Oct 9, 2013 at 9:35 PM, praveen malviya <[email protected]>wrote: > > On 09-Oct-13 6:09 AM, Greg Hurlman wrote: > > The other problem is: > > When a service unit is unlocked-in and unlocked to make the sa aware > components run, and while doing so, one of the component faulted due to > 'csiSetcallbackFailed' : Recovery is 'ComponentRestart', > Then an immediate locking of the SU fails with reason as bad operation. > The component then enters to the escalation matrix, going for a SuFailover > and node restart. There is no admin operation > I could find while the error escalation is happening with the SU with > states as, > saAmfSUAdminState=UNLOCKED(1) > saAmfSUOperState=ENABLED(1) > saAmfSUPresenceState=INSTANTIATED(3) > saAmfSUReadinessState=IN-SERVICE(2), > to bring back the SU to locked-in or locked state. Only a service stop > could break out of the escalation matrix preventing a node failure. > > Is this correct behavior or is there any other way to break out while in > error escalation? > > These is no break out from escalation. After each fault AMF cleans up > the component and re instantiate it. So the clean up script must cleanup > all the resources > taken up by the component. If AMF gets successful clean up status, then > it means component has released all the system resources. If clean up is > unsuccessful it means > some system resources were not released. This will lead to abnormal exit > status of cleanup script to AMF and due to this both the component and SU > will move to TERM_FAILED state. In this state escalation will stop and > admin intervention is required for repair. > Please see why component is not able to take assignments even after each > cleanup and instantiation success phase. > Thanks > Praveen > > Greg > > ---------- Forwarded message ---------- > From: Greg Hurlman <[email protected]> > Date: Mon, Oct 7, 2013 at 2:42 PM > Subject: Re: [users] Info regd imm configuration > To: praveen malviya <[email protected]> > Cc: [email protected] > > > Thanks praveen for the suggestion. However, I am not sure I understand > completely what you mean by "Otherwise let the component move to INST/TERM > failed state intentionally. In these two states AMF will not take any > recovery action." > > Could you please tell me how to intentionally change the state? > > Thanks, > Greg > > > On Mon, Oct 7, 2013 at 5:42 AM, praveen malviya < > [email protected]> wrote: > >> >> On 04-Oct-13 10:16 AM, Greg Hurlman wrote: >> >> Thanks Praveen that answers my question. >> >> Some other issues while testing in version 4.3.1 found that, >> >> 1. While a component is made to restarts for some valid error scenario >> for the defined number of saAmfSGCompRestartMax(Value=10), recovery option >> set to component restart, it does not honor the attributes >> >> saAmfCompNumMaxInstantiateWithoutDelay(value=2) >> saAmfCompNumMaxInstantiateWithDelay and (value=8) >> saAmfCompDelayBetweenInstantiateAttempts(Value=10000000000). >> >> They are tickets for them: #107 and #374. >> >> Does this attributes depends on other configurations or is getting >> overridden by some other attributes? >> >> 2. While verifying compDisableRestart, I set compDisableRestart=true, >> when the component was continuously restarting. >> >> Result was instead of component restart it went to SUFailover as the >> recovery option and then node reboot. While I was expecting the behavior to >> just pause restarting and again when the value will be set to >> compDisableRestart=false, then start instantiate it and if fails again >> continue restarting with restart count set to zero. Am I interpreting >> correctly here? Is this attribute dependent or being overridden by some >> other configuration attribute? >> >> If compDisableRestart=true means component is not restart capable and >> AMF will perform next level of recovery which is suFailover. It does not >> mean stop component from restarting. Please see spec for more details of >> such attributes. >> >> Basically I was looking for debugging, a configuration option to pause >> temporarily or stop the continuous attempts to instantiate this component >> as a result of SU unlock admin operation, but not able to come up for a >> valid error reason. Every other attempt fails and eventually escalated to >> node reboot and prevents from looking into the issue. Do we have a way to >> stop this? >> >> CallbackTimeout can be configured with very large values and break >> point on csiset callback can be put to wait for debugging. >> Otherwise let the component move to INST/TERM failed state intentionally. >> In these two states AMF will not take any recovery action. >> >> Thanks >> Praveen >> >> Anything obvious errors I am doing here? >> >> Thanks >> Greg >> >> >> >> On Thu, Oct 3, 2013 at 2:17 AM, praveen malviya < >> [email protected]> wrote: >> >>> Please see inline. >>> >>> On 03-Oct-13 9:37 AM, Greg Hurlman wrote: >>> >>> Thanks Praveen, >>> >>> Precisely I wanted the configuration to form different node groups out >>> of the payload nodes. Ex: If PL-1, PL-2, PL-3 and PL-4 are the payload >>> nodes, then how do we differentiate the DN names given to the >>> saAmfNGNodeList >>> when PL-1 and PL-2 form one group and PL-3 and PL-4 form the other group. I >>> believe PLs we can not mention in DN otherwise the DN names for both the >>> groups will be the same. >>> >>> I tried to make a new group from existing imm.xml which is having 2 SCs >>> and 2 PLs >>> To make SC-1 and PL-3 in one group: >>> 1)immcfg -c SaAmfNodeGroup >>> safAmfNodeGroup=SC-1andPL-3,safAmfCluster=myAmfCluster -a >>> saAmfNGNodeList=safAmfNode=SC-1,safAmfCluster=myAmfCluster >>> 2) immcfg safAmfNodeGroup=SC-1andPL-3,safAmfCluster=myAmfCluster -a >>> saAmfNGNodeList+=safAmfNode=PL-3,safAmfCluster=myAmfCluster >>> 3)immlist safAmfNodeGroup=SC-1andPL-3,safAmfCluster=myAmfCluster gives >>> Name Type Value(s) >>> ======================================================================== >>> safAmfNodeGroup SA_STRING_T >>> safAmfNodeGroup=SC-1andPL-3 >>> saAmfNGNodeList SA_NAME_T >>> safAmfNode=SC-1,safAmfCluster=myAmfCluster (42) >>> safAmfNode=PL-3,safAmfCluster=myAmfCluster (42) >>> SaImmAttrImplementerName SA_STRING_T >>> safAmfService >>> SaImmAttrClassName SA_STRING_T >>> SaAmfNodeGroup >>> SaImmAttrAdminOwnerName SA_STRING_T <Empty> >>> >>> 4) Now in AMF application in SG class: >>> <attr> >>> <name>saAmfSGSuHostNodeGroup</name> >>> <value>safAmfNodeGroup=SC-1andPL-3,safAmfCluster=myAmfCluster</value> >>> </attr> >>> >>> So AMF will spawn SUs on SC-1 and PL-3 for this SG. >>> >>> >>> Also could you tell me the configurations for protection group and >>> service instance assignment with reference to both examples for both 2N and >>> N-way active models? >>> >>> I did not get this questions correctly. These configuration are >>> already in samples directory of opensaf tar. >>> Once these configurations are brought up, command "amf-state siass" will >>> show HA states of SIs in SUs. >>> >>> Thanks >>> Praveen >>> >>> Thanks, >>> Greg >>> >>> >>> On Mon, Sep 23, 2013 at 10:41 PM, praveen malviya < >>> [email protected]> wrote: >>> >>>> Please see inline. >>>> >>>> On 24-Sep-13 2:27 AM, Greg Hurlman wrote: >>>> >>>>> Hi Guys, >>>>> >>>>> Need some help in understanding more on configuring the SG and >>>>> protection >>>>> groups for the below example scenario. >>>>> >>>>> Referring to spec SAI-AIS-AMF-B.04.01, section 3.1.11 figure 2, and >>>>> sample >>>>> example AppConfig-2N.xml, how would I can configure IMM model for the >>>>> below >>>>> entities: >>>>> >>>>> 1. SG1 spanning node U and V? Do I need to mention something like this >>>>> or >>>>> something else? >>>>> >>>>> <attr> >>>>> <name>saAmfSGSuHostNodeGroup</name> >>>>> <value>safAmfNodeGroup=U,V ,safAmfCluster=myAmfCluster</value> >>>>> </attr> >>>>> 2. PG A1 between components C1 and C3. >>>>> 3. Service instance A, being assigned active to S1 and standby to S2 >>>>> >>>> 1) Suppose the above mentioned node group has been created and it >>>> contains two nodes in its attribute NodeList: >>>> saAmfNGNodeList SA_NAME_T >>>> safAmfNode=NodeU,safAmfCluster=myAmfCluster (42) >>>> safAmfNode=NodeV,safAmfCluster=myAmfCluster (42) >>>> >>>> 2) Now in order to map Service Unit S1 to Node U, configure S1 with >>>> attibute: >>>> <attr> >>>> <name>saAmfSUHostNodeOrNodeGroup</name> >>>> <value>safAmfNode=NodeU,safAmfCluster=myAmfCluster</value> >>>> </attr> >>>> >>>> Similary for S2. >>>> >>>> So in this way S1 will come up on Node U and S2 will come up on S2. >>>> In AppConfig-2N.xml: >>>> For mapping SU1 to SC-1 add this attrbiute: >>>> <attr> >>>> <name>saAmfSUHostNodeOrNodeGroup</name> >>>> <value>safAmfNode=SC-1,safAmfCluster=myAmfCluster</value> >>>> </attr> >>>> >>>> For SU2 replace SC-1 by SC-2. >>>> >>>> Thanks >>>> Praveen >>>> >>>>> Thanks, >>>>> Greg >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> October Webinars: Code for Performance >>>>> Free Intel webinars can help you accelerate application performance. >>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the >>>>> most from >>>>> the latest Intel processors and coprocessors. See abstracts and >>>>> register > >>>>> >>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk >>>>> _______________________________________________ >>>>> Opensaf-users mailing list >>>>> [email protected] >>>>> https://lists.sourceforge.net/lists/listinfo/opensaf-users >>>>> >>>> >>>> >>> >>> >> >> > > > ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users
