Hi

This review has been out for a long time now and there are no more comments 
after the one below.
I will push this enhancement in 4 days if no more comments.
Please continue testing. After push write tickets.

Thanks'
Lennart

> -----Original Message-----
> From: Anders Widell
> Sent: den 19 augusti 2013 12:19
> To: Lennart Lund
> Cc: [email protected]; [email protected]
> Subject: Re: [PATCH 0 of 7] Review Request for logsv: Fix hanging main
> thread when file i/o dont return
> 
> One more comment:
> 
> It would be good if the error code SA_AIS_ERR_TIMEOUT could be avoided.
> I think it can: by modifying the slave thread so that it undoes a write
> operation if the master thread timed out before the write was finished.
> Then the slave thread can use lseek() and ftruncate() to undo the write.
> 
> regards,
> Anders Widell
> 
> 2013-08-19 10:07, Lennart Lund skrev:
> > Summary: logsv: Fix hanging main thread when file i/o don't return
> > Review request for Trac Ticket(s): #9 Peer Reviewer(s): Madhurika
> > Koppula, (Anders Widell, Hans Feldt) Pull request to: NA Affected
> > branch(es): devel (4.4) Development branch: <<IF ANY GIVE THE REPO
> > URL>>
> >
> >
> > --------------------------------
> > Impacted area       Impact y/n
> > --------------------------------
> >   Docs                    n
> >   Build system            n
> >   RPM/packaging           n
> >   Configuration files     n
> >   Startup scripts         n
> >   SAF services            y
> >   OpenSAF services        n
> >   Core libraries          n
> >   Samples                 n
> >   Tests                   n
> >   Other                   n
> >
> >
> > Comments (indicate scope for each "y" above):
> > ---------------------------------------------
> > In order to protect the log server "main thread" (MT) from hanging if
> > a file operation like write, mkdir etc. does not return, all such
> > operations are done in a separate "file thread" (FT).
> > Functions running in the "Main Thread" (MT) that needs file system
> > operations handle over the execution to the FT when file handling has
> > to be done. Execution is then given back to the MT again. If a file
> > operation does not return FT will hang but MT will time out the FT and
> resume. A timeout will be handled as a file operation fail.
> > The MT can detect if the FT is hanging and new requests for file operations
> will be "failed".
> >
> > Note:
> > This review request contains all patches. Some of the "old" patches are
> concatenated (qfold)
> > Old patches       New patch
> > 1 - 7             1
> > 8                 2
> > 9 - 11            3
> > 12                4
> > 13                5
> > 14                6
> > 15                7
> >
> > Patch 5 - 7 are new for this review.
> >
> >
> > changeset b32ee924b9716330c8d7b54f5556fc235c31fbab
> > Author:     Lennart Lund <[email protected]>
> > Date:       Mon, 19 Aug 2013 09:12:30 +0200
> >
> >     logsv: Fix hanging main thread when file i/o don't return. [#9] Part
> > 1
> >
> >     Generic thread handling:
> >     - Generic thread handling
> >     - Convert functions to use threaded file handling
> >     - Handling of object implementer rejects
> >     - Invalidate stream fd if errno EBADF when writing log record
> >     - Fix Error handling for too long path (> PATH_MAX)
> >     - Functions that uses a handler in file thread has got extension _h
> >
> > changeset ed70f6043029ad9c7ea5f55439130023acea13bc
> > Author:     Lennart Lund <[email protected]>
> > Date:       Mon, 19 Aug 2013 09:13:22 +0200
> >
> >     logsv: Fix hanging main thread when file i/o don't return. [#9] part
> > 2
> >
> >     - Fix review remarks and some findings from test
> >     - Fix some findings found when using code analyze tool
> >     - Cleanup of TRACE and LOG
> >     - Add information for contributors/maintainers about file system
> handling in
> >     the Log-service README file
> >
> > changeset 1ab74048f572d3ecb843651dea627399e77a0afc
> > Author:     Lennart Lund <[email protected]>
> > Date:       Mon, 19 Aug 2013 09:13:56 +0200
> >
> >     logsv: Fix hanging main thread when file i/o don't return. [#9] Part
> > 3
> >
> >     - Remove unnecessary data copying in log_file_api() and
> file_hndl_thread()
> >     - Return SA_AIS_ERR_TIMEOUT if the write operation time out when
> a log
> >     record shall be written. If the file thread is already "hanging" when a
> >     write is requested no attempt to write is made and
> SA_AIS_ERR_TRY_AGAIN is
> >     returned as before.
> >     - Try to recover file thread by recreating it if it hangs for a long 
> > time.
> >     - Recover if bad file descriptor or stale NFS handle.
> >
> >     - Always reinitialize/reopen log files if a write operation fails, 
> > timeout
> >     of file thread (hanging file system) included.
> >     - Handle synchronization between nodes when log files cannot be
> created before
> >     a switch over without using any new flag that has to be checkpointed
> >     (remove "files_initialized" flag)
> >     - Incorrect handling of "partial write" is fixed. See #536
> >
> >     - Open log files with O_NONBLOCK. Answer client with
> AIS_ERR_TIMEOUT if
> >     EWOULDBLOCK/EAGAIN (record may be parially written)
> >
> > changeset 9891f1e38d7c32cb5be4f929101548dc466c79b9
> > Author:     Lennart Lund <[email protected]>
> > Date:       Mon, 19 Aug 2013 09:18:18 +0200
> >
> >     logsv: Fix hanging main thread when file i/o don't return. [#9] Part
> > 4
> >
> >     - Make timeouts for file hdl configurable in Log service configuration
> >     object
> >
> > changeset dd80bd737715084537c0affe959ba619d0752fba
> > Author:     Lennart Lund <[email protected]>
> > Date:       Mon, 19 Aug 2013 09:19:03 +0200
> >
> >     logsv: Fix hanging main thread when file i/o don't return. [#9] Part
> > 5
> >
> >     - Fix error in lgs_make_reldir_h(). Root directory can be corrupt if 
> > file
> >     thread is hanging.
> >
> > changeset 15498547b6da54ecdecf2f6e2806d60027d291dc
> > Author:     Lennart Lund <[email protected]>
> > Date:       Mon, 19 Aug 2013 09:20:52 +0200
> >
> >     logsv: Fix hanging main thread when file i/o don't return. [#9] Part
> > 6
> >
> >     - Remove thread recovery handling (kill and restart thread)
> >
> > changeset 08a97594d0471b611baff692bd8704392a6d6f50
> > Author:     Lennart Lund <[email protected]>
> > Date:       Mon, 19 Aug 2013 09:22:09 +0200
> >
> >     logsv: Fix hanging main thread when file i/o don't return. [#9] Part
> > 7
> >
> >     - Update saflogger to handle SA_AIS_ERR_TIMEOUT
> >
> >
> > Added Files:
> > ------------
> >   README_LOGENH
> >   osaf/services/saf/logsv/lgs/lgs_file.c
> >   osaf/services/saf/logsv/lgs/lgs_file.h
> >   osaf/services/saf/logsv/lgs/lgs_filehdl.c
> >   osaf/services/saf/logsv/lgs/lgs_filehdl.h
> >
> >
> > Removed Files:
> > --------------
> >   README_LOGENH
> >
> >
> > Complete diffstat:
> > ------------------
> >   osaf/services/saf/logsv/README            |   23 +++
> >   osaf/services/saf/logsv/lgs/Makefile.am   |    8 +-
> >   osaf/services/saf/logsv/lgs/lgs.h         |    1 +
> >   osaf/services/saf/logsv/lgs/lgs_cb.h      |    2 +
> >   osaf/services/saf/logsv/lgs/lgs_evt.c     |    5 +-
> >   osaf/services/saf/logsv/lgs/lgs_evt.h     |    4 +
> >   osaf/services/saf/logsv/lgs/lgs_file.c    |  416
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++
> >   osaf/services/saf/logsv/lgs/lgs_file.h    |   71 ++++++++++
> >   osaf/services/saf/logsv/lgs/lgs_filehdl.c |  612
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> +++++++++++++++++++++++++++++++
> >   osaf/services/saf/logsv/lgs/lgs_filehdl.h |  162
> +++++++++++++++++++++++
> >   osaf/services/saf/logsv/lgs/lgs_imm.c     |  227
> ++++++++++++++++++++++++--------
> >   osaf/services/saf/logsv/lgs/lgs_main.c    |   12 +-
> >   osaf/services/saf/logsv/lgs/lgs_mbcsv.c   |    7 +
> >   osaf/services/saf/logsv/lgs/lgs_mbcsv.h   |    3 +
> >   osaf/services/saf/logsv/lgs/lgs_stream.c  |  594
> +++++++++++++++++++++++++++++++++++++++++++++++++++------------
> ----------------------
> >   osaf/services/saf/logsv/lgs/lgs_stream.h  |    4 +-
> >   osaf/services/saf/logsv/lgs/lgs_util.c    |  446
> +++++++++++++++++++++++++++++++++++++++++-----------------------
> >   osaf/services/saf/logsv/lgs/lgs_util.h    |   21 ++-
> >   18 files changed, 2147 insertions(+), 471 deletions(-)
> >
> >
> > Testing Commands:
> > -----------------
> > 1. Regession test
> >> logtest
> > 2. Switch over test (using alarm stream)
> >> saflogger -l -s crit "alarm message 1"
> >> cat repl_opensaf/saflog/saLogAlarm_SOME_DATE.log
> >   Printout containing "alarm message 1"
> >> immadm -o 7 safSi=SC-2N,safApp=OpenSAF saflogger -l -s crit "alarm
> >> message 2"
> >> cat repl_opensaf/saflog/saLogAlarm_SOME_DATE.log
> >   Printout contaning "alarm message 1" and "alarm message 2"
> > 3. Redo tests after node start with simulated
> >     unavailable filesystem for the log service
> >   - Activate simulated unavailable file system by uncommenting
> >     the LLD_DELAY_TST define in file lgs_file.c in the log server.
> >     This means to "hang" the "file thread" for some tme during system
> >     start.
> >   - Rebuild the log server.
> >   - Remove old log files in repl-opensaf/saflog/
> >   - Start the cluster with the rebuilt log server.
> >     Note: The repl_opensaf/saflog directory is empty after system
> >           start. The .cfg and .log files for alarm, notoify and system
> >           that normally can be found is missing since they could not be
> >           created during system start. However files for respective log
> >           stream will be created when writing log records.
> >   - Re-run test 1 and 2
> >
> > Note:
> > The current logtest is using a hard-coded log root path that maybe
> > will not work on some systems and that is not the default path that
> > can be found in the log service configuration object. A ticket for the
> > logtest is written [#541]. A fix exists and patches can be found in a
> > review request for ticket #541
> >
> >
> > Testing, Expected Results:
> > --------------------------
> > 1.   Regression test with no fail.
> > 2.   "alarm message 1" and "alarm message 1" found in the same file.
> > 3.1. Regression test with no fail.
> > 3.2. "alarm message 1" and "alarm message 1" found in the same file.
> >
> >
> > Conditions of Submission:
> > -------------------------
> >   <<HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC>>
> >
> >
> > Arch      Built     Started    Linux distro
> > -------------------------------------------
> > mips        n          n
> > mips64      n          n
> > x86         n          n
> > x86_64      n          n
> > powerpc     n          n
> > powerpc64   n          n
> >
> >
> > Reviewer Checklist:
> > -------------------
> > [Submitters: make sure that your review doesn't trigger any
> > checkmarks!]
> >
> >
> > Your checkin has not passed review because (see checked entries):
> >
> > ___ Your RR template is generally incomplete; it has too many blank entries
> >      that need proper data filled in.
> >
> > ___ You have failed to nominate the proper persons for review and push.
> >
> > ___ Your patches do not have proper short+long header
> >
> > ___ You have grammar/spelling in your header that is unacceptable.
> >
> > ___ You have exceeded a sensible line length in your
> headers/comments/text.
> >
> > ___ You have failed to put in a proper Trac Ticket # into your commits.
> >
> > ___ You have incorrectly put/left internal data in your comments/files
> >      (i.e. internal bug tracking tool IDs, product names etc)
> >
> > ___ You have not given any evidence of testing beyond basic build tests.
> >      Demonstrate some level of runtime or other sanity testing.
> >
> > ___ You have ^M present in some of your files. These have to be removed.
> >
> > ___ You have needlessly changed whitespace or added whitespace crimes
> >      like trailing spaces, or spaces before tabs.
> >
> > ___ You have mixed real technical changes with whitespace and other
> >      cosmetic code cleanup changes. These have to be separate commits.
> >
> > ___ You need to refactor your submission into logical chunks; there is
> >      too much content into a single commit.
> >
> > ___ You have extraneous garbage in your review (merge commits etc)
> >
> > ___ You have giant attachments which should never have been sent;
> >      Instead you should place your content in a public tree to be pulled.
> >
> > ___ You have too many commits attached to an e-mail; resend as threaded
> >      commits, or place in a public tree for a pull.
> >
> > ___ You have resent this content multiple times without a clear indication
> >      of what has changed between each re-send.
> >
> > ___ You have failed to adequately and individually address all of the
> >      comments and change requests that were proposed in the initial review.
> >
> > ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)
> >
> > ___ Your computer have a badly configured date and time; confusing the
> >      the threaded patch review.
> >
> > ___ Your changes affect IPC mechanism, and you don't present any results
> >      for in-service upgradability test.
> >
> > ___ Your changes affect user manual and documentation, your patch series
> >      do not contain the patch that updates the Doxygen manual.
> >


------------------------------------------------------------------------------
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. 
http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to