Hi Praveen I normally dont get involved in AMF patch reviews but this ticket and the fix caught my attention. There is a general issue that bothers me about the approach, if I have not missunderstood it.
I understand this is a node failover of active controller. That is inherrently an event that is not fully under control. It is also an event that really is time critical. A failover may occurr in several ways. Here it seems that one kind of failover is "semi-controlable" and old active is in essence trying to "clean up" its backlog in a job queue before it triggers the failover. There will be other failover cases, such as a crash of the IMMD where it will not be able to do this. So any cleanup (if necessary) must anyway be covered by new active. In addition, updates to cached runtime data is a secondary duty of the AMF. Cached runtime data is CACHED and not absolutely obligated to reflect the original State (which is in the AMF) in realtime. So updates of cached runtiome data should not Really be a reason for delaying a failover. /AndersBj -----Original Message----- From: praveen.malv...@oracle.com [mailto:praveen.malv...@oracle.com] Sent: den 7 maj 2014 10:26 To: Hans Feldt; nagendr...@oracle.com Cc: opensaf-devel@lists.sourceforge.net Subject: [devel] [PATCH 0 of 1] Review Request for amfd: update RT objects before node-failover of active controller [#494]. Summary: amfd: update RT objects before node-failover of active controller [#494]. Review request for Trac Ticket(s): #494 (its duplicates #853 and #858) Peer Reviewer(s): Hans F., Nagendra. Pull request to: <<LIST THE PERSON WITH PUSH ACCESS HERE>> Affected branch(es): All Development branch: <<IF ANY GIVE THE REPO URL>> -------------------------------- Impacted area Impact y/n -------------------------------- Docs n Build system n RPM/packaging n Configuration files n Startup scripts n SAF services n OpenSAF services y Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): --------------------------------------------- Please see the analysis og tickets and commit log below. changeset bcf6eda79102f83c6940d75dd13073a9130026d0 Author: praveen.malv...@oracle.com Date: Wed, 07 May 2014 13:43:33 +0530 amfd: update RT objects before node-failover of active controller [#494]. Problem: Run time objects and attributes are not updated when node-failover gots escalated for active controller and standby controller took the active role. Reason: Activities related to update of runtime objects and certain attribute to IMM are given low priotiy and are pushed in Job queue by AMF. These jobs are completed when AMF is not busy in any other high priority activity. When node-failover is escalated, AMFD sends reboot message to AMFND to reboot the node. In case node-failover is escalated for active controller, it will send reboot message to AMFND which will reboot the controller. In such a case, some IMM related activites in JOB queue will remian uncompleted. All such activites should be compleleted before rebooting the active controller when node-failover is escalated for it. Fix: Fix will finish all IMM related jobs before sending reboot message to AMFND when node-failover is escalated for active controller. Complete diffstat: ------------------ osaf/services/saf/amf/amfd/sgproc.cc | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-) Testing Commands: ----------------- Tested the duplicate bug #858. This is easy to reproduce. After reproducing observed the states: safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1 saAmfSUAdminState=UNLOCKED(1) saAmfSUOperState=ENABLED(1) saAmfSUPresenceState=UNINSTANTIATED(1) saAmfSUReadinessState=IN-SERVICE(2) Testing, Expected Results: -------------------------- Pass observed the satates: safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1 saAmfSUAdminState=UNLOCKED(1) saAmfSUOperState=DISABLED(2) saAmfSUPresenceState=UNINSTANTIATED(1) saAmfSUReadinessState=OUT-OF-SERVICE(1) AMFD logs: May 7 12:05:47.624746 osafamfd [26472:imm.cc:0143] >> exec: Update 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' saAmfSUReadinessState May 7 12:05:47.624799 osafamfd [26472:imma_oi_api.c:2270] >> saImmOiRtObjectUpdate_2 May 7 12:05:47.626863 osafamfd [26472:mds_dt_trans.c:0671] >> mdtm_process_poll_recv_data_tcp May 7 12:05:47.627392 osafamfd [26472:imma_oi_api.c:2554] << saImmOiRtObjectUpdate_2 May 7 12:05:47.627419 osafamfd [26472:imm.cc:0172] << exec May 7 12:05:47.634134 osafamfd [26472:util.cc:1681] TR Sending REBOOT MSG to 2010f May 7 12:05:47.634372 osafamfd [26472:sgproc.cc:0715] << avd_su_oper_state_evh Conditions of Submission: ------------------------- Ack from one of the reviewers. Arch Built Started Linux distro ------------------------------------------- mips n n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: ------------------- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. ------------------------------------------------------------------------------ Is your legacy SCM system holding you back? Join Perforce May 7 to find out: • 3 signs your SCM is hindering your productivity • Requirements for releasing software faster • Expert tips and advice for migrating your SCM now http://p.sf.net/sfu/perforce _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel ------------------------------------------------------------------------------ Is your legacy SCM system holding you back? Join Perforce May 7 to find out: • 3 signs your SCM is hindering your productivity • Requirements for releasing software faster • Expert tips and advice for migrating your SCM now http://p.sf.net/sfu/perforce _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel