Hi This review has been out for a long time now and there are no more comments after the one below. I will push this enhancement in 4 days if no more comments. Please continue testing. After push write tickets.
Thanks' Lennart > -----Original Message----- > From: Anders Widell > Sent: den 19 augusti 2013 12:19 > To: Lennart Lund > Cc: [email protected]; [email protected] > Subject: Re: [PATCH 0 of 7] Review Request for logsv: Fix hanging main > thread when file i/o dont return > > One more comment: > > It would be good if the error code SA_AIS_ERR_TIMEOUT could be avoided. > I think it can: by modifying the slave thread so that it undoes a write > operation if the master thread timed out before the write was finished. > Then the slave thread can use lseek() and ftruncate() to undo the write. > > regards, > Anders Widell > > 2013-08-19 10:07, Lennart Lund skrev: > > Summary: logsv: Fix hanging main thread when file i/o don't return > > Review request for Trac Ticket(s): #9 Peer Reviewer(s): Madhurika > > Koppula, (Anders Widell, Hans Feldt) Pull request to: NA Affected > > branch(es): devel (4.4) Development branch: <<IF ANY GIVE THE REPO > > URL>> > > > > > > -------------------------------- > > Impacted area Impact y/n > > -------------------------------- > > Docs n > > Build system n > > RPM/packaging n > > Configuration files n > > Startup scripts n > > SAF services y > > OpenSAF services n > > Core libraries n > > Samples n > > Tests n > > Other n > > > > > > Comments (indicate scope for each "y" above): > > --------------------------------------------- > > In order to protect the log server "main thread" (MT) from hanging if > > a file operation like write, mkdir etc. does not return, all such > > operations are done in a separate "file thread" (FT). > > Functions running in the "Main Thread" (MT) that needs file system > > operations handle over the execution to the FT when file handling has > > to be done. Execution is then given back to the MT again. If a file > > operation does not return FT will hang but MT will time out the FT and > resume. A timeout will be handled as a file operation fail. > > The MT can detect if the FT is hanging and new requests for file operations > will be "failed". > > > > Note: > > This review request contains all patches. Some of the "old" patches are > concatenated (qfold) > > Old patches New patch > > 1 - 7 1 > > 8 2 > > 9 - 11 3 > > 12 4 > > 13 5 > > 14 6 > > 15 7 > > > > Patch 5 - 7 are new for this review. > > > > > > changeset b32ee924b9716330c8d7b54f5556fc235c31fbab > > Author: Lennart Lund <[email protected]> > > Date: Mon, 19 Aug 2013 09:12:30 +0200 > > > > logsv: Fix hanging main thread when file i/o don't return. [#9] Part > > 1 > > > > Generic thread handling: > > - Generic thread handling > > - Convert functions to use threaded file handling > > - Handling of object implementer rejects > > - Invalidate stream fd if errno EBADF when writing log record > > - Fix Error handling for too long path (> PATH_MAX) > > - Functions that uses a handler in file thread has got extension _h > > > > changeset ed70f6043029ad9c7ea5f55439130023acea13bc > > Author: Lennart Lund <[email protected]> > > Date: Mon, 19 Aug 2013 09:13:22 +0200 > > > > logsv: Fix hanging main thread when file i/o don't return. [#9] part > > 2 > > > > - Fix review remarks and some findings from test > > - Fix some findings found when using code analyze tool > > - Cleanup of TRACE and LOG > > - Add information for contributors/maintainers about file system > handling in > > the Log-service README file > > > > changeset 1ab74048f572d3ecb843651dea627399e77a0afc > > Author: Lennart Lund <[email protected]> > > Date: Mon, 19 Aug 2013 09:13:56 +0200 > > > > logsv: Fix hanging main thread when file i/o don't return. [#9] Part > > 3 > > > > - Remove unnecessary data copying in log_file_api() and > file_hndl_thread() > > - Return SA_AIS_ERR_TIMEOUT if the write operation time out when > a log > > record shall be written. If the file thread is already "hanging" when a > > write is requested no attempt to write is made and > SA_AIS_ERR_TRY_AGAIN is > > returned as before. > > - Try to recover file thread by recreating it if it hangs for a long > > time. > > - Recover if bad file descriptor or stale NFS handle. > > > > - Always reinitialize/reopen log files if a write operation fails, > > timeout > > of file thread (hanging file system) included. > > - Handle synchronization between nodes when log files cannot be > created before > > a switch over without using any new flag that has to be checkpointed > > (remove "files_initialized" flag) > > - Incorrect handling of "partial write" is fixed. See #536 > > > > - Open log files with O_NONBLOCK. Answer client with > AIS_ERR_TIMEOUT if > > EWOULDBLOCK/EAGAIN (record may be parially written) > > > > changeset 9891f1e38d7c32cb5be4f929101548dc466c79b9 > > Author: Lennart Lund <[email protected]> > > Date: Mon, 19 Aug 2013 09:18:18 +0200 > > > > logsv: Fix hanging main thread when file i/o don't return. [#9] Part > > 4 > > > > - Make timeouts for file hdl configurable in Log service configuration > > object > > > > changeset dd80bd737715084537c0affe959ba619d0752fba > > Author: Lennart Lund <[email protected]> > > Date: Mon, 19 Aug 2013 09:19:03 +0200 > > > > logsv: Fix hanging main thread when file i/o don't return. [#9] Part > > 5 > > > > - Fix error in lgs_make_reldir_h(). Root directory can be corrupt if > > file > > thread is hanging. > > > > changeset 15498547b6da54ecdecf2f6e2806d60027d291dc > > Author: Lennart Lund <[email protected]> > > Date: Mon, 19 Aug 2013 09:20:52 +0200 > > > > logsv: Fix hanging main thread when file i/o don't return. [#9] Part > > 6 > > > > - Remove thread recovery handling (kill and restart thread) > > > > changeset 08a97594d0471b611baff692bd8704392a6d6f50 > > Author: Lennart Lund <[email protected]> > > Date: Mon, 19 Aug 2013 09:22:09 +0200 > > > > logsv: Fix hanging main thread when file i/o don't return. [#9] Part > > 7 > > > > - Update saflogger to handle SA_AIS_ERR_TIMEOUT > > > > > > Added Files: > > ------------ > > README_LOGENH > > osaf/services/saf/logsv/lgs/lgs_file.c > > osaf/services/saf/logsv/lgs/lgs_file.h > > osaf/services/saf/logsv/lgs/lgs_filehdl.c > > osaf/services/saf/logsv/lgs/lgs_filehdl.h > > > > > > Removed Files: > > -------------- > > README_LOGENH > > > > > > Complete diffstat: > > ------------------ > > osaf/services/saf/logsv/README | 23 +++ > > osaf/services/saf/logsv/lgs/Makefile.am | 8 +- > > osaf/services/saf/logsv/lgs/lgs.h | 1 + > > osaf/services/saf/logsv/lgs/lgs_cb.h | 2 + > > osaf/services/saf/logsv/lgs/lgs_evt.c | 5 +- > > osaf/services/saf/logsv/lgs/lgs_evt.h | 4 + > > osaf/services/saf/logsv/lgs/lgs_file.c | 416 > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++ > > osaf/services/saf/logsv/lgs/lgs_file.h | 71 ++++++++++ > > osaf/services/saf/logsv/lgs/lgs_filehdl.c | 612 > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > +++++++++++++++++++++++++++++++ > > osaf/services/saf/logsv/lgs/lgs_filehdl.h | 162 > +++++++++++++++++++++++ > > osaf/services/saf/logsv/lgs/lgs_imm.c | 227 > ++++++++++++++++++++++++-------- > > osaf/services/saf/logsv/lgs/lgs_main.c | 12 +- > > osaf/services/saf/logsv/lgs/lgs_mbcsv.c | 7 + > > osaf/services/saf/logsv/lgs/lgs_mbcsv.h | 3 + > > osaf/services/saf/logsv/lgs/lgs_stream.c | 594 > +++++++++++++++++++++++++++++++++++++++++++++++++++------------ > ---------------------- > > osaf/services/saf/logsv/lgs/lgs_stream.h | 4 +- > > osaf/services/saf/logsv/lgs/lgs_util.c | 446 > +++++++++++++++++++++++++++++++++++++++++----------------------- > > osaf/services/saf/logsv/lgs/lgs_util.h | 21 ++- > > 18 files changed, 2147 insertions(+), 471 deletions(-) > > > > > > Testing Commands: > > ----------------- > > 1. Regession test > >> logtest > > 2. Switch over test (using alarm stream) > >> saflogger -l -s crit "alarm message 1" > >> cat repl_opensaf/saflog/saLogAlarm_SOME_DATE.log > > Printout containing "alarm message 1" > >> immadm -o 7 safSi=SC-2N,safApp=OpenSAF saflogger -l -s crit "alarm > >> message 2" > >> cat repl_opensaf/saflog/saLogAlarm_SOME_DATE.log > > Printout contaning "alarm message 1" and "alarm message 2" > > 3. Redo tests after node start with simulated > > unavailable filesystem for the log service > > - Activate simulated unavailable file system by uncommenting > > the LLD_DELAY_TST define in file lgs_file.c in the log server. > > This means to "hang" the "file thread" for some tme during system > > start. > > - Rebuild the log server. > > - Remove old log files in repl-opensaf/saflog/ > > - Start the cluster with the rebuilt log server. > > Note: The repl_opensaf/saflog directory is empty after system > > start. The .cfg and .log files for alarm, notoify and system > > that normally can be found is missing since they could not be > > created during system start. However files for respective log > > stream will be created when writing log records. > > - Re-run test 1 and 2 > > > > Note: > > The current logtest is using a hard-coded log root path that maybe > > will not work on some systems and that is not the default path that > > can be found in the log service configuration object. A ticket for the > > logtest is written [#541]. A fix exists and patches can be found in a > > review request for ticket #541 > > > > > > Testing, Expected Results: > > -------------------------- > > 1. Regression test with no fail. > > 2. "alarm message 1" and "alarm message 1" found in the same file. > > 3.1. Regression test with no fail. > > 3.2. "alarm message 1" and "alarm message 1" found in the same file. > > > > > > Conditions of Submission: > > ------------------------- > > <<HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC>> > > > > > > Arch Built Started Linux distro > > ------------------------------------------- > > mips n n > > mips64 n n > > x86 n n > > x86_64 n n > > powerpc n n > > powerpc64 n n > > > > > > Reviewer Checklist: > > ------------------- > > [Submitters: make sure that your review doesn't trigger any > > checkmarks!] > > > > > > Your checkin has not passed review because (see checked entries): > > > > ___ Your RR template is generally incomplete; it has too many blank entries > > that need proper data filled in. > > > > ___ You have failed to nominate the proper persons for review and push. > > > > ___ Your patches do not have proper short+long header > > > > ___ You have grammar/spelling in your header that is unacceptable. > > > > ___ You have exceeded a sensible line length in your > headers/comments/text. > > > > ___ You have failed to put in a proper Trac Ticket # into your commits. > > > > ___ You have incorrectly put/left internal data in your comments/files > > (i.e. internal bug tracking tool IDs, product names etc) > > > > ___ You have not given any evidence of testing beyond basic build tests. > > Demonstrate some level of runtime or other sanity testing. > > > > ___ You have ^M present in some of your files. These have to be removed. > > > > ___ You have needlessly changed whitespace or added whitespace crimes > > like trailing spaces, or spaces before tabs. > > > > ___ You have mixed real technical changes with whitespace and other > > cosmetic code cleanup changes. These have to be separate commits. > > > > ___ You need to refactor your submission into logical chunks; there is > > too much content into a single commit. > > > > ___ You have extraneous garbage in your review (merge commits etc) > > > > ___ You have giant attachments which should never have been sent; > > Instead you should place your content in a public tree to be pulled. > > > > ___ You have too many commits attached to an e-mail; resend as threaded > > commits, or place in a public tree for a pull. > > > > ___ You have resent this content multiple times without a clear indication > > of what has changed between each re-send. > > > > ___ You have failed to adequately and individually address all of the > > comments and change requests that were proposed in the initial review. > > > > ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc) > > > > ___ Your computer have a badly configured date and time; confusing the > > the threaded patch review. > > > > ___ Your changes affect IPC mechanism, and you don't present any results > > for in-service upgradability test. > > > > ___ Your changes affect user manual and documentation, your patch series > > do not contain the patch that updates the Doxygen manual. > > ------------------------------------------------------------------------------ LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-devel
