Hi!

I have some comments:

* Time-out limits shall be configurable (according to our new policy on 
limits)

* I think the use code for killing and re-starting the slave thread is 
overkill and should be removed. Unless we know (and have seen) that this 
solves a real problem that can happen in practice.

regards,
Anders Widell

2013-08-09 15:20, Lennart Lund skrev:
> Summary: logsv: Fix hanging main thread when file i/o don't return
> Review request for Trac Ticket(s): #9
> Peer Reviewer(s): Madhurika Koppula, (Anders Widell, Hans Feldt)
> Pull request to: NA
> Affected branch(es): devel (4.4)
> Development branch: <<IF ANY GIVE THE REPO URL>>
>
>
> --------------------------------
> Impacted area       Impact y/n
> --------------------------------
>   Docs                    n
>   Build system            n
>   RPM/packaging           n
>   Configuration files     n
>   Startup scripts         n
>   SAF services            y
>   OpenSAF services        n
>   Core libraries          n
>   Samples                 n
>   Tests                   n
>   Other                   n
>
>
> Comments (indicate scope for each "y" above):
> ---------------------------------------------
> In order to protect the log server "main thread" (MT) from hanging if a file 
> operation
> like write, mkdir etc. does not return, all such operations are done in a 
> separate
> "file thread" (FT).
> Functions running in the "Main Thread" (MT) that needs file system operations
> handle over the execution to the FT when file handling has to be done. 
> Execution
> is then given back to the MT again. If a file operation does not return FT 
> will hang but
> MT will time out the FT and resume. A timeout will be handled as a file 
> operation fail.
> The MT can detect if the FT is hanging and new requests for file operations 
> will be "failed".
>
> Note1: This is an add on to the patches sent out in prevoius review requests.
> Note2: The last patch (part 11); The non block handling of log files that was 
> suggested by Madhurika
>         is contained in its' own patch.
>
>
> changeset 43a3e4173f05a01aa595b3d770a1464a3338f32e
> Author:       Lennart Lund <lennart.l...@ericsson.com>
> Date: Fri, 09 Aug 2013 13:46:36 +0200
>
>       logsv: Fix hanging main thread when file i/o don't return. [#9]. Part 9
>
>       - Remove unnecessary data copying in log_file_api() and 
> file_hndl_thread()
>       - Return SA_AIS_ERR_TIMEOUT if the write operation time out when a log
>       record shall be written. If the file thread is already "hanging" when a
>       write is requested no attempt to write is made and SA_AIS_ERR_TRY_AGAIN 
> is
>       returned as before.
>       - Try to recover file thread by recreating it if it hangs for a long 
> time.
>       - Recover if bad file descriptor or stale NFS handle.
>
> changeset d3d78dc3ad87e083e411e0ce5436bcca511d54d6
> Author:       Lennart Lund <lennart.l...@ericsson.com>
> Date: Fri, 09 Aug 2013 13:46:36 +0200
>
>       logsv: Fix hanging main thread when file i/o don't return. [#9]. Part 10
>
>       - Always reinitialize/reopen log files if a write operation fails, 
> timeout
>       of file thread (hanging file system) included.
>       - Handle synchronization between nodes when log files cannot be created 
> before
>       a switch over without using any new flag that has to be checkpointed
>       (remove "files_initialized" flag)
>       - Incorrect handling of "partial write" is fixed. See #536
>
> changeset b8a2060ff5fd6d2685f95757d422addeca1ebdb0
> Author:       Lennart Lund <lennart.l...@ericsson.com>
> Date: Fri, 09 Aug 2013 13:46:36 +0200
>
>       logsv: Fix hanging main thread when file i/o don't return. [#9]. Part 11
>
>       - Open log files with O_NONBLOCK. Answer client with AIS_ERR_TIMEOUT if
>       EWOULDBLOCK/EAGAIN (record may be parially written)
>
>
> Complete diffstat:
> ------------------
>   osaf/services/saf/logsv/lgs/lgs_evt.c     |   11 ++--
>   osaf/services/saf/logsv/lgs/lgs_file.c    |  197 
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------------
>   osaf/services/saf/logsv/lgs/lgs_filehdl.c |    2 +-
>   osaf/services/saf/logsv/lgs/lgs_imm.c     |    1 -
>   osaf/services/saf/logsv/lgs/lgs_mbcsv.c   |    7 ---
>   osaf/services/saf/logsv/lgs/lgs_mbcsv.h   |    3 -
>   osaf/services/saf/logsv/lgs/lgs_stream.c  |  115 
> +++++++++++++++++++++------------------------------
>   osaf/services/saf/logsv/lgs/lgs_stream.h  |    1 -
>   8 files changed, 196 insertions(+), 141 deletions(-)
>
>
> Testing Commands:
> -----------------
> See previous review
>
>
> Testing, Expected Results:
> --------------------------
>   <<PASTE COMMAND OUTPUTS / TEST RESULTS>>
>
>
> Conditions of Submission:
> -------------------------
>   <<HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC>>
>
>
> Arch      Built     Started    Linux distro
> -------------------------------------------
> mips        n          n
> mips64      n          n
> x86         n          n
> x86_64      n          n
> powerpc     n          n
> powerpc64   n          n
>
>
> Reviewer Checklist:
> -------------------
> [Submitters: make sure that your review doesn't trigger any checkmarks!]
>
>
> Your checkin has not passed review because (see checked entries):
>
> ___ Your RR template is generally incomplete; it has too many blank entries
>      that need proper data filled in.
>
> ___ You have failed to nominate the proper persons for review and push.
>
> ___ Your patches do not have proper short+long header
>
> ___ You have grammar/spelling in your header that is unacceptable.
>
> ___ You have exceeded a sensible line length in your headers/comments/text.
>
> ___ You have failed to put in a proper Trac Ticket # into your commits.
>
> ___ You have incorrectly put/left internal data in your comments/files
>      (i.e. internal bug tracking tool IDs, product names etc)
>
> ___ You have not given any evidence of testing beyond basic build tests.
>      Demonstrate some level of runtime or other sanity testing.
>
> ___ You have ^M present in some of your files. These have to be removed.
>
> ___ You have needlessly changed whitespace or added whitespace crimes
>      like trailing spaces, or spaces before tabs.
>
> ___ You have mixed real technical changes with whitespace and other
>      cosmetic code cleanup changes. These have to be separate commits.
>
> ___ You need to refactor your submission into logical chunks; there is
>      too much content into a single commit.
>
> ___ You have extraneous garbage in your review (merge commits etc)
>
> ___ You have giant attachments which should never have been sent;
>      Instead you should place your content in a public tree to be pulled.
>
> ___ You have too many commits attached to an e-mail; resend as threaded
>      commits, or place in a public tree for a pull.
>
> ___ You have resent this content multiple times without a clear indication
>      of what has changed between each re-send.
>
> ___ You have failed to adequately and individually address all of the
>      comments and change requests that were proposed in the initial review.
>
> ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)
>
> ___ Your computer have a badly configured date and time; confusing the
>      the threaded patch review.
>
> ___ Your changes affect IPC mechanism, and you don't present any results
>      for in-service upgradability test.
>
> ___ Your changes affect user manual and documentation, your patch series
>      do not contain the patch that updates the Doxygen manual.
>
>
> ------------------------------------------------------------------------------
> Get 100% visibility into Java/.NET code with AppDynamics Lite!
> It's a free troubleshooting tool designed for production.
> Get down to code-level detail for bottlenecks, with <2% overhead.
> Download for free and get started troubleshooting in minutes.
> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
> _______________________________________________
> Opensaf-devel mailing list
> Opensaf-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
>
>


------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to