HI Alex, I appreciate your contribution in CPSV , thanks for your patch , I will take some time reviewing/testing the complete patch (database a C++ STL map for fast access ) ,for now my initial comment is :
On 5/21/2014 2:57 AM, Alex Jones wrote: > make MAX_SYNC_TRANSFER_SIZE much smaller: 3M instead of 30M. Considering the below fragmentation changes (changesets 5279- 5282) , I hope it is not required to change form 30MB to 3MB so can you please re-test ckpt API functions return with SA_AIS_ERR_TIMEOUT case and let me know the result. ====================================================================================== changeset: 5282:0a995a368ec3 user: Hans Feldt <[email protected]> date: Mon May 19 21:00:03 2014 +0200 summary: mds: adjust some constants [#654] changeset: 5281:ad842ed4d464 summary: mds: use TIPC segmentation/reassembly [#654] changeset: 5280:4d0da0f9e7cf summary: mds: support variable size fragmentation limit [#654] changeset: 5279:2a60ec043b3d summary: mds: use TIPC defined max msg size for rec buffer [#654] ====================================================================================== -AVM On 5/21/2014 2:57 AM, Alex Jones wrote: > Summary: cpnd: increase performance when creating large numbers of sections > [#770] > Review request for Trac Ticket(s): 770 > Peer Reviewer(s): AVM > Pull request to: AVM > Affected branch(es): default > Development branch: > > -------------------------------- > Impacted area Impact y/n > -------------------------------- > Docs n > Build system n > RPM/packaging n > Configuration files n > Startup scripts n > SAF services y > OpenSAF services n > Core libraries n > Samples n > Tests n > Other n > > > Comments (indicate scope for each "y" above): > --------------------------------------------- > Please ignore the previous "fat-finger" submission. > > changeset 6f9c8080657af783462c52d81137e41038aee6a6 > Author: Alex Jones <[email protected]> > Date: Tue, 20 May 2014 17:03:25 -0400 > > cpnd: increase performance when creating large numbers of sections > [#770] > > Jan 7 21:32:32.772347 <1789648919> ERR |MDTM: Frag recd is not next frag > so dropping adest=<0x010010023922604c> Jan 7 21:32:32.772399 > <1789648919> > ERR |MDTM: Message is dropped as msg is out of seq TRANSPOR- > ID=<0x010010023922604c> With large numbers of sections (>5k) on the > standby > the CPU is pegged and all ckpt API functions return with > SA_AIS_ERR_TIMEOUT, > including ActiveReplicaSet, and CheckpointClose! > > The section id database is implemented as a linked list. Each write to a > section must traverse the list in order to find the section. With > 1000's of > sections this takes a looong time. Also, sync data being sent over is > too > large for one packet (30M). This causes the transport layer (in this > case > TIPC), to drop packets. Lastly, the SectionCreate message is not > asynchronous when ACTIVE_REPLICA is specified. > > Solution is in 3 parts: (1) make the section id database a C++ STL map > for > fast access. (2) make MAX_SYNC_TRANSFER_SIZE much smaller: 3M instead of > 30M. (3) SectionCreate message should be asynchronous when > ACTIVE_REPLICA is > specified. > > > Added Files: > ------------ > osaf/services/saf/cpsv/cpnd/cpnd_sec.cc > > > Complete diffstat: > ------------------ > osaf/libs/common/cpsv/include/cpnd_cb.h | 6 +- > osaf/libs/common/cpsv/include/cpnd_init.h | 14 +- > osaf/libs/common/cpsv/include/cpsv_evt.h | 2 +- > osaf/services/saf/cpsv/cpnd/Makefile.am | 3 +- > osaf/services/saf/cpsv/cpnd/cpnd_db.c | 246 +---------------- > osaf/services/saf/cpsv/cpnd/cpnd_evt.c | 115 ++++--- > osaf/services/saf/cpsv/cpnd/cpnd_proc.c | 26 +- > osaf/services/saf/cpsv/cpnd/cpnd_res.c | 15 +- > osaf/services/saf/cpsv/cpnd/cpnd_sec.cc | 424 > ++++++++++++++++++++++++++++++ > 9 files changed, 531 insertions(+), 320 deletions(-) > > > Testing Commands: > ----------------- > (1) Create a 6-node setup > (2) Open 5 checkpoints (40k sections for each checkpoint, each section 1k) for > writing, 1 on each active node > (3) On the standby node, open these 5 for reading > (4) Continuously write the active checkpoints as fast as possible. > > > Testing, Expected Results: > -------------------------- > The standby opening all 5 checkpoints should be able to keep up with all the > writes (200k sections). No dropped packets, no timeouts. > > > Conditions of Submission: > ------------------------- > > > > Arch Built Started Linux distro > ------------------------------------------- > mips n n > mips64 n n > x86 n n > x86_64 y y > powerpc n n > powerpc64 n n > > > Reviewer Checklist: > ------------------- > [Submitters: make sure that your review doesn't trigger any checkmarks!] > > > Your checkin has not passed review because (see checked entries): > > ___ Your RR template is generally incomplete; it has too many blank entries > that need proper data filled in. > > ___ You have failed to nominate the proper persons for review and push. > > ___ Your patches do not have proper short+long header > > ___ You have grammar/spelling in your header that is unacceptable. > > ___ You have exceeded a sensible line length in your headers/comments/text. > > ___ You have failed to put in a proper Trac Ticket # into your commits. > > ___ You have incorrectly put/left internal data in your comments/files > (i.e. internal bug tracking tool IDs, product names etc) > > ___ You have not given any evidence of testing beyond basic build tests. > Demonstrate some level of runtime or other sanity testing. > > ___ You have ^M present in some of your files. These have to be removed. > > ___ You have needlessly changed whitespace or added whitespace crimes > like trailing spaces, or spaces before tabs. > > ___ You have mixed real technical changes with whitespace and other > cosmetic code cleanup changes. These have to be separate commits. > > ___ You need to refactor your submission into logical chunks; there is > too much content into a single commit. > > ___ You have extraneous garbage in your review (merge commits etc) > > ___ You have giant attachments which should never have been sent; > Instead you should place your content in a public tree to be pulled. > > ___ You have too many commits attached to an e-mail; resend as threaded > commits, or place in a public tree for a pull. > > ___ You have resent this content multiple times without a clear indication > of what has changed between each re-send. > > ___ You have failed to adequately and individually address all of the > comments and change requests that were proposed in the initial review. > > ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc) > > ___ Your computer have a badly configured date and time; confusing the > the threaded patch review. > > ___ Your changes affect IPC mechanism, and you don't present any results > for in-service upgradability test. > > ___ Your changes affect user manual and documentation, your patch series > do not contain the patch that updates the Doxygen manual. > ------------------------------------------------------------------------------ "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available Simple to use. Nothing to install. Get started now for free." http://p.sf.net/sfu/SauceLabs _______________________________________________ Opensaf-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-devel
