Summary: Pre-review of 2PBE complete dataflow. Review request for Trac Ticket(s): (#21) Peer Reviewer(s): Neel Pull request to: Affected branch(es): devel(4.4) Development branch:
-------------------------------- Impacted area Impact y/n -------------------------------- Docs n Build system n RPM/packaging n Configuration files n Startup scripts n SAF services n OpenSAF services n Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): --------------------------------------------- Thisd is a pre-review of the basic data flow solution that I propose for 2PBE. By pre-review I mean: These specific patches will not be pushed. Thus I am not waiting for an "ack" on these patches. The intention is to communicate the development of 2PBE so far and to allow anyone to test and experimentat with it, or just inspect the code to get an understanding of how it will work. This is not a complete solution for 2PBE. There is still work to be done to get tighter syncronisation in the commit of an imm-transaction between the two PBEs. With regular single PBE, the commit of one imm-transaction was atomic with the commit with the commit to the sqlite file. ion fact the commit of the sqlite transaction *was* the commit of the imm-transaction. With 2PBE, the commit of the transaction to PBE is still the commit of the imm-transaction. But now there is also the issue of if and how to get the two sqlite files to commit atomically. My current stance on this is that I will not attempt to make the commit to the two sqlite files 100% atomic. This would either require at least two sqlite transaction commits/writes for every one imm-transaction, or require the utilization of some open 2pc interface in sqlite if that is available. I believe there is some kind of callback hook availble for sqlite transaction prepare. But I will not attempt to use that for now. The basic idea is instead to keep one primary PBE and then add a slave PBE. To make each imm-transaction commit with syncronization between the two PBEs before sqlite commit, minimizing the probability that one sqlite instance succeeds in commiting while the other sqlite instance fails. Still they may of course diverge, due to file system problems or crashes. The first thing to note here is that a cluster start, after such a broken PBE commit, will be handled by the loading arbitration and that arbitration will choose to load from the sqlite file that succeeded in the commit (the latest file) as long as it is available. And even if the cluster restart should come up with one SC with the file that failed to commit the last ccb and timeout in waiting for the other and thus load from the older file, then there is also no problem because this only means that the transaction will have been aborted. Note that the user/client of the transaction will not have obtained any answer on the outcome unless both PBEs succeeded in their commit. For regular CCBs, the primary PBE will prepare by executing all the sqlite calls needed to build the transaction, but before comitting it sends a a message to the slave PBE asking it it has received all requests to be part of the ccb and if so to also execute all the sqlite calls to prepare the ccb. When/if the standby replies with ok, then the primary PBE will commit its sqlite transaction and reply to immsv. The immsv will then commit the ccb in imm-cluster-ram. Finaly as part of sending messages about the commit of the ccb to all appliers, the slave PBE will get both completed and apply and will (hopefully) commit its sqlite transaction. In general, the slave PBE is tightly controlled by the primary PBE (an asymetric solution). For class management and PRTO/PRTA handling the solution is slightly different. In general I have also tried to leverage from existing mechanisms, such as making the slave PBE an applier of all config classes, thus avoiding additional and unnecessary distributed messaging for the payload of a ccb. changeset 131d635a526a0e894245ecb08b630268f9035976 Author: Anders Bjornerstedt <[email protected]> Date: Fri, 19 Jul 2013 08:39:25 +0200 IMM: 2PBE loading (test patch-1) [#21] This is a testpatch containing a test version of the 2PBE loading mechanism. This patch will not be pushed. The intent with this patch is to allow testing and obtaining feedback on the 2PBE loading. This firt patch has aready been sent out once earlier for pre-review. Keep in mind that it is for cluster restarts that all this is intended. So in one sense this is the most important patch to be as reliable as possible. -------------------------------------------------------------- changeset f240bedd6aa2b92bdea38d1c6532ac8097ac9444 Author: Anders Bjornerstedt <[email protected]> Date: Fri, 19 Jul 2013 08:45:07 +0200 IMM: 2PBE ccb-handling (test patch-2) [#21] This is a testpatch containing a test version of the 2PBE ccb-handling. This patch will not be pushed. The intent with this patch is to allow testing and obtaining feedback on the 2PBE ccb-handling. This test-patch goes on top of the "2PBE loading" testpatch. Contains the process management changes to start and restart 2 PBEs, including failover handling. Plus the data flow solution for CCBs. THe detailed synhronisation of ccb commit between the two PBEs is still missing. ------------------------------------------------------------- changeset 421acf69d0765d0860bfe726b8f1ba1ccb519e6b Author: Anders Bjornerstedt <[email protected]> Date: Fri, 19 Jul 2013 08:55:23 +0200 IMM: 2PBE class-create/delete/schema change handling (test patch-3) [#21] This is a testpatch containing a test version of the 2PBE class-handling. This patch will not be pushed. The intent with this patch is to allow testing and obtaining feedback on the 2PBE handling of imm classes. This test-patch goes on top of the "2PBE ccb-handling" testpatch. Provides 2PBE persistification of class create/delete/schema-change. Extends the existing PBE oultion that uses an admin-op from immsv to PBE so that the primary PBE invokes the same admin-op towards slave. --------------------------------------------------------------- changeset d6f91100b51e2b7215bf5fab95e10d0c069d14db Author: Anders Bjornerstedt <[email protected]> Date: Fri, 19 Jul 2013 09:10:43 +0200 IMM: 2PBE PRTO-create handling (test patch-4) [#21] This is a testpatch containing a test version of the 2PBE handling of creates of persistent runtime objects (PRTOs). This patch will not be pushed. The intent with this patch is to allow testing and obtaining feedback on the 2PBE handling of PRTO creates This test-patch goes on top of the "2PBE class-handling" testpatch. Provides 2PBE persistification of PRTO create. In this case the immsv directly sends the same payload callbacks to both the primary and slave PBE. The primary will invoke an admin-op towards the slave to syncronize (not implemented). changeset 416b82d2d116c2a9ede63b507c72c10ee2126f42 Author: Anders Bjornerstedt <[email protected]> Date: Fri, 19 Jul 2013 09:16:17 +0200 IMM: 2PBE PRTO-delete handling (test patch-5) [#21] This is a testpatch containing a test version of the 2PBE handling of deletes of persistent runtime objects (PRTOs). This patch will not be pushed. The intent with this patch is to allow testing and obtaining feedback on the 2PBE handling of PRTO deletes This test-patch goes on top of the "2PBE PRTO-create" testpatch. Provides 2PBE persistification of PRTO delete. This operation is quite different from PRTO create because PRTO delete can in general be the delete of a subtree, i.e. several PRTOs that have to be deleted as one transaction. Again the immsv sends the same messages that it sent to the PBE, now also to the slave PBE. changeset 894df553ef7e1b829db43329a5bafe4ce7fb2cbd Author: Anders Bjornerstedt <[email protected]> Date: Fri, 19 Jul 2013 09:20:45 +0200 IMM: 2PBE PRTA-update handling (test patch-6) [#21] This is a testpatch containing a test version of the 2PBE handling of updates of persistent runtime attributes (PRTAs). PRTAs can exist in either PRTOs or config objects. This patch will not be pushed. The intent with this patch is to allow testing and obtaining feedback on the 2PBE handling of PRTA updates This test-patch goes on top of the "2PBE PRTO-delete" Provides 2PBE persistification of PRTA updates. This more similar to PRTO create than it is PRTO delete, but PRTA updaes can be an update in a config object. Complete diffstat: ------------------ osaf/libs/agents/saf/imma/imma_oi_api.c | 4 +- osaf/libs/agents/saf/imma/imma_proc.c | 34 +++- osaf/libs/common/immsv/immpbe_dump.cc | 62 ++++++- osaf/libs/common/immsv/immsv_evt.c | 55 ++++++- osaf/libs/common/immsv/include/immpbe_dump.hh | 6 +- osaf/libs/common/immsv/include/immsv_api.h | 17 +- osaf/libs/common/immsv/include/immsv_evt.h | 18 +- osaf/libs/common/immsv/include/immsv_evt_model.h | 4 + osaf/services/saf/immsv/immd/immd_amf.c | 5 +- osaf/services/saf/immsv/immd/immd_cb.h | 6 +- osaf/services/saf/immsv/immd/immd_db.c | 2 + osaf/services/saf/immsv/immd/immd_evt.c | 82 ++++++++++- osaf/services/saf/immsv/immd/immd_main.c | 40 +++++- osaf/services/saf/immsv/immd/immd_proc.c | 229 +++++++++++++++++++++++++++++- osaf/services/saf/immsv/immd/immd_proc.h | 3 +- osaf/services/saf/immsv/immd/immd_sbevt.c | 23 ++- osaf/services/saf/immsv/immloadd/imm_loader.cc | 315 ++++++++++++++++++++++++++++++++++++------ osaf/services/saf/immsv/immloadd/imm_loader.hh | 8 +- osaf/services/saf/immsv/immloadd/imm_pbe_load.cc | 196 ++++++++++++++++++++++++- osaf/services/saf/immsv/immnd/ImmModel.cc | 333 ++++++++++++++++++++++++++++++++++--------- osaf/services/saf/immsv/immnd/ImmModel.hh | 17 +- osaf/services/saf/immsv/immnd/ImmSearchOp.cc | 5 - osaf/services/saf/immsv/immnd/immnd_cb.h | 6 +- osaf/services/saf/immsv/immnd/immnd_evt.c | 345 +++++++++++++++++++++++++++++++++++++++++++--- osaf/services/saf/immsv/immnd/immnd_init.h | 8 +- osaf/services/saf/immsv/immnd/immnd_main.c | 3 +- osaf/services/saf/immsv/immnd/immnd_proc.c | 246 +++++++++++++++++++++++++++------ osaf/services/saf/immsv/immpbed/immpbe.cc | 77 +++++---- osaf/services/saf/immsv/immpbed/immpbe.hh | 4 + osaf/services/saf/immsv/immpbed/immpbe_daemon.cc | 677 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------- tests/immsv/implementer/applier.c | 121 ++++++++++++--- 31 files changed, 2559 insertions(+), 392 deletions(-) Testing Commands: ----------------- Testing, Expected Results: -------------------------- The basic normal positive use cases should work, including the persistification of data to *both* pbe files. Most negative test cases (failover, killing of processes) should also. The main risk to watch for is when/if a PBE restarts with '--recover' which means it simply re-attaches to the sqlite file without regenerating a fresh file dumped from imm-ram, then there is a risk that there could have been introduced a discrepancy between the files. In particular, the last transaction being processed at the time of the crash could have commited at the other PBE, yet have been rolled back at the restarted PBE. Persistent writes are blocked as long as not both PBEs are available. Conditions of Submission: ------------------------- These patches will not be pushed. Arch Built Started Linux distro ------------------------------------------- mips n n mips64 n n x86 n n x86_64 n n powerpc n n powerpc64 n n Reviewer Checklist: ------------------- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. ------------------------------------------------------------------------------ See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-devel
