From: Anand Sundararaj <s.an...@gethighavailability.com> Summary: amf: support error report on non local component [#109] V2 Review request for Ticket(s): 109 Peer Reviewer(s): Minh, Thang, Nagendra, Paul Pull request to: Amf Maintainers Affected branch(es): develop Development branch: ticket-109 Base revision: 59ded7cdf6a431e522229afd5ecb989e4a61c7d8 Personal repository: git://git.code.sf.net/u/s-anand-has/review
-------------------------------- Impacted area Impact y/n -------------------------------- Docs n Build system n RPM/packaging n Configuration files n Startup scripts n SAF services y OpenSAF services n Core libraries n Samples n Tests n Other n NOTE: Patch(es) contain lines longer than 80 characers Comments (indicate scope for each "y" above): --------------------------------------------- *** EXPLAIN/COMMENT THE PATCH SERIES HERE *** revision d7cf4c63df3a915b9280f193d438d770cb219f4a Author: Anand Sundararaj <s.an...@gethighavailability.com> Date: Fri, 31 Jul 2020 17:04:17 +0530 amf: support error report on non local component [#109] V2 Complete diffstat: ------------------ src/amf/amfnd/amfnd.cc | 22 ++-- src/amf/amfnd/avnd_cb.h | 2 + src/amf/amfnd/avnd_comp.h | 2 + src/amf/amfnd/clm.cc | 33 +++++- src/amf/amfnd/err.cc | 23 ++++- src/amf/amfnd/imm.cc | 258 ++++++++++++++++++++++++++++++++++++++++------ 6 files changed, 294 insertions(+), 46 deletions(-) Testing Commands: ----------------- Configure amf demo on Comp1/SU1(on SC-1) and Comp2/SU2 (on PL-3) 1. Report error(saAmfComponentErrorReport_4) from Comp1 runnign on SC-1 for Comp2 running on PL-3 with recommendedRecovery as SA_AMF_COMPONENT_RESTART Comp2 restarts osafamfnd[2450]: NO Restarting a component of 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' (comp restart count: 1) osafamfnd[2450]: NO 'safComp=AmfDemo,safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' faulted due to 'errorReport' : Recovery is 'componentRestart' osafamfnd[2450]: NO 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' Presence State INSTANTIATED => RESTARTING 2. Repeat tc #1 with unconfigured component name like safComp=AmfDem5, then the return is SA_AIS_ERR_NOT_EXIST(12) osafamfnd[3970]: NO Component 'safComp=AmfDem5,safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' is not configured amf_demo[14441]: saAmfComponentErrorReport_4 FAILED - 12 on safComp=AmfDem5,safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 3. Stop PL-3 and rerun the tc #1, the return will be SA_AIS_ERR_UNAVAILABLE(31) amf_demo[14922]: saAmfComponentErrorReport_4 FAILED - 31 on safComp=AmfDemo,safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 4. Repeat tc #1. When error report call comes to Amfnd of PL-3, then keep gdb and stop PL-3 The return will be SA_AIS_ERR_TIMEOUT(5) amf_demo[15503]: saAmfComponentErrorReport_4 FAILED - 5 on safComp=AmfDemo,safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 5. Lock PL-3 and repeat tc #1. The component will restart at PL-3 6. Lock and lock-in PL-3 and repeat tc #1. The error report will return SA_AIS_ERR_INVALID_PARAM(7) amf_demo[15773]: saAmfComponentErrorReport_4 FAILED - 7 on safComp=AmfDemo,safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 7. Lock Clm node PL-3, repeat tc #1. The error report will return SA_AIS_ERR_UNAVAILABLE(31) amf_demo[15873]: saAmfComponentErrorReport_4 FAILED - 31 on safComp=AmfDemo,safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 8. Kill component on PL-3 and return non-zero in cleanup command, it will go into TERMINATION_FAILED Now repeat #1, the return will be SA_AIS_ERR_INVALID_PARAM(7) amf_demo[16016]: saAmfComponentErrorReport_4 FAILED - 7 on safComp=AmfDemo,safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1 9. Repeat the tc #8 for INSTANTIATION_FAILED, the same result. 10. Repeat tc #1 with recommendedRecovery as SA_AMF_NODE_SWITCHOVER osafamfnd[2419]: NO 'safComp=AmfDemo,safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' faulted due to 'errorReport' : Recovery is 'nodeSwitchover' osafamfnd[2419]: NO Informing director of Nodeswitchover 11. Repeat tc #1 when su unlock operation going on SU2 of PL-3. While admin unlock is going on SU2(i.e. when it gets Act cbk, then hold the response for 5 seconds), call saAmfComponentErrorReport_4() from Comp1(Running on SC-1) as in tc #1. Comp2 will restart and get Act assignment again. osafamfnd[3258]: NO Assigning 'safSi=AmfDemo,safApp=AmfDemo1' ACTIVE to 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' osafamfnd[3258]: NO 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' component restart probation timer started (timeout: 400000000000 ns) osafamfnd[3258]: NO Restarting a component of 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' (comp restart count: 1) osafamfnd[3258]: NO 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' Presence State RESTARTING => INSTANTIATED osafamfnd[3258]: NO Assigned 'safSi=AmfDemo,safApp=AmfDemo1' ACTIVE to 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' 12. Repeat tc #11 for su shutdown/lock, node&SG lock/unlock/shutdown, SI lock/unlock. The same result. 13. Repeat tc #1 for NPI component. The npi component get restarted. 14. Repeat tc #3 for NPI. The same result. 15. Repeat rc #4 for NPI. The same result. Testing, Expected Results: -------------------------- As described above Conditions of Submission: ------------------------- *** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC *** Arch Built Started Linux distro ------------------------------------------- mips n n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: ------------------- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel