I will implement this as a separate check with a printout if SA_AIS_ERR_TIMEOUT

/Lennart

> -----Original Message-----
> From: Lennart Lund [mailto:lennart.l...@ericsson.com]
> Sent: den 16 augusti 2013 14:14
> To: Anders Widell
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [devel] [PATCH 0 of 3] Review Request for logsv: Fix hanging
> main thread when file i/o dont return [#9]
> 
> Hi,
> 
> Is it ok to just add SA_AIS_ERR_TIMEOUT to the try again loop (poll_retry)?
> 
> Existing code:
>       if (cb_error == SA_AIS_ERR_TRY_AGAIN) {
>               usleep(100000); /* 100 ms */
>               try_agains++;
>               goto retry;
>       }
> 
> Changed code:
> retry:
>       if ((cb_error == SA_AIS_ERR_TRY_AGAIN) || (errorCode ==
> SA_AIS_ERR_TIMEOUT)) {
>               usleep(100000); /* 100 ms */
>               try_agains++;
>               goto retry;
>       }
> 
> Thanks'
> Lennart
> 
> > -----Original Message-----
> > From: Anders Widell
> > Sent: den 16 augusti 2013 12:25
> > To: Lennart Lund
> > Cc: opensaf-devel@lists.sourceforge.net
> > Subject: Re: [devel] [PATCH 0 of 3] Review Request for logsv: Fix
> > hanging main thread when file i/o dont return [#9]
> >
> > One more comment:
> >
> > * saf_logger.c needs to be updated to handle the error code
> > SA_AIS_ERR_TIMEOUT
> >
> > regards,
> > Anders Widell
> >
> > 2013-08-16 12:16, Lennart Lund skrev:
> > > Hi,
> > >
> > > See my comments to Anders W below
> > >
> > > Thanks'
> > > Lennart
> > >
> > >> -----Original Message-----
> > >> From: Anders Widell [mailto:anders.wid...@ericsson.com]
> > >> Sent: den 16 augusti 2013 10:47
> > >> To: opensaf-devel@lists.sourceforge.net
> > >> Subject: Re: [devel] [PATCH 0 of 3] Review Request for logsv: Fix
> > >> hanging main thread when file i/o dont return [#9]
> > >>
> > >> Hi!
> > >>
> > >> I have some comments:
> > >>
> > >> * Time-out limits shall be configurable (according to our new
> > >> policy on
> > >> limits)
> > > Already done. See patch, part 12
> > >
> > >> * I think the use code for killing and re-starting the slave thread
> > >> is overkill and should be removed. Unless we know (and have seen)
> > >> that this solves a real problem that can happen in practice.
> > > Ok, I will remove it
> > >
> > >> regards,
> > >> Anders Widell
> > >>
> > >> 2013-08-09 15:20, Lennart Lund skrev:
> > >>> Summary: logsv: Fix hanging main thread when file i/o don't return
> > >>> Review request for Trac Ticket(s): #9 Peer Reviewer(s): Madhurika
> > >>> Koppula, (Anders Widell, Hans Feldt) Pull request to: NA Affected
> > >>> branch(es): devel (4.4) Development branch: <<IF ANY GIVE THE REPO
> > >>> URL>>
> > >>>
> > >>>
> > >>> --------------------------------
> > >>> Impacted area       Impact y/n
> > >>> --------------------------------
> > >>>    Docs                    n
> > >>>    Build system            n
> > >>>    RPM/packaging           n
> > >>>    Configuration files     n
> > >>>    Startup scripts         n
> > >>>    SAF services            y
> > >>>    OpenSAF services        n
> > >>>    Core libraries          n
> > >>>    Samples                 n
> > >>>    Tests                   n
> > >>>    Other                   n
> > >>>
> > >>>
> > >>> Comments (indicate scope for each "y" above):
> > >>> ---------------------------------------------
> > >>> In order to protect the log server "main thread" (MT) from hanging
> > >>> if a file operation like write, mkdir etc. does not return, all
> > >>> such operations are done in a separate "file thread" (FT).
> > >>> Functions running in the "Main Thread" (MT) that needs file system
> > >>> operations handle over the execution to the FT when file handling
> > >>> has to be done. Execution is then given back to the MT again. If a
> > >>> file operation does not return FT will hang but MT will time out
> > >>> the FT and
> > >> resume. A timeout will be handled as a file operation fail.
> > >>> The MT can detect if the FT is hanging and new requests for file
> > >>> operations
> > >> will be "failed".
> > >>> Note1: This is an add on to the patches sent out in prevoius
> > >>> review
> > >> requests.
> > >>> Note2: The last patch (part 11); The non block handling of log
> > >>> files that was
> > >> suggested by Madhurika
> > >>>          is contained in its' own patch.
> > >>>
> > >>>
> > >>> changeset 43a3e4173f05a01aa595b3d770a1464a3338f32e
> > >>> Author: Lennart Lund <lennart.l...@ericsson.com>
> > >>> Date:   Fri, 09 Aug 2013 13:46:36 +0200
> > >>>
> > >>>         logsv: Fix hanging main thread when file i/o don't return. [#9].
> > >>> Part
> > >>> 9
> > >>>
> > >>>         - Remove unnecessary data copying in log_file_api() and
> > >> file_hndl_thread()
> > >>>         - Return SA_AIS_ERR_TIMEOUT if the write operation time out when
> > >> a log
> > >>>         record shall be written. If the file thread is already 
> > >>> "hanging" when a
> > >>>         write is requested no attempt to write is made and
> > >> SA_AIS_ERR_TRY_AGAIN is
> > >>>         returned as before.
> > >>>         - Try to recover file thread by recreating it if it hangs for a 
> > >>> long time.
> > >>>         - Recover if bad file descriptor or stale NFS handle.
> > >>>
> > >>> changeset d3d78dc3ad87e083e411e0ce5436bcca511d54d6
> > >>> Author: Lennart Lund <lennart.l...@ericsson.com>
> > >>> Date:   Fri, 09 Aug 2013 13:46:36 +0200
> > >>>
> > >>>         logsv: Fix hanging main thread when file i/o don't return. [#9].
> > >>> Part
> > >>> 10
> > >>>
> > >>>         - Always reinitialize/reopen log files if a write operation 
> > >>> fails, timeout
> > >>>         of file thread (hanging file system) included.
> > >>>         - Handle synchronization between nodes when log files cannot be
> > >> created before
> > >>>         a switch over without using any new flag that has to be 
> > >>> checkpointed
> > >>>         (remove "files_initialized" flag)
> > >>>         - Incorrect handling of "partial write" is fixed. See #536
> > >>>
> > >>> changeset b8a2060ff5fd6d2685f95757d422addeca1ebdb0
> > >>> Author: Lennart Lund <lennart.l...@ericsson.com>
> > >>> Date:   Fri, 09 Aug 2013 13:46:36 +0200
> > >>>
> > >>>         logsv: Fix hanging main thread when file i/o don't return. [#9].
> > >>> Part
> > >>> 11
> > >>>
> > >>>         - Open log files with O_NONBLOCK. Answer client with
> > >> AIS_ERR_TIMEOUT if
> > >>>         EWOULDBLOCK/EAGAIN (record may be parially written)
> > >>>
> > >>>
> > >>> Complete diffstat:
> > >>> ------------------
> > >>>    osaf/services/saf/logsv/lgs/lgs_evt.c     |   11 ++--
> > >>>    osaf/services/saf/logsv/lgs/lgs_file.c    |  197
> > >>
> >
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >> +++++-------------------------
> > >>>    osaf/services/saf/logsv/lgs/lgs_filehdl.c |    2 +-
> > >>>    osaf/services/saf/logsv/lgs/lgs_imm.c     |    1 -
> > >>>    osaf/services/saf/logsv/lgs/lgs_mbcsv.c   |    7 ---
> > >>>    osaf/services/saf/logsv/lgs/lgs_mbcsv.h   |    3 -
> > >>>    osaf/services/saf/logsv/lgs/lgs_stream.c  |  115
> > >> +++++++++++++++++++++------------------------------
> > >>>    osaf/services/saf/logsv/lgs/lgs_stream.h  |    1 -
> > >>>    8 files changed, 196 insertions(+), 141 deletions(-)
> > >>>
> > >>>
> > >>> Testing Commands:
> > >>> -----------------
> > >>> See previous review
> > >>>
> > >>>
> > >>> Testing, Expected Results:
> > >>> --------------------------
> > >>>    <<PASTE COMMAND OUTPUTS / TEST RESULTS>>
> > >>>
> > >>>
> > >>> Conditions of Submission:
> > >>> -------------------------
> > >>>    <<HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC>>
> > >>>
> > >>>
> > >>> Arch      Built     Started    Linux distro
> > >>> -------------------------------------------
> > >>> mips        n          n
> > >>> mips64      n          n
> > >>> x86         n          n
> > >>> x86_64      n          n
> > >>> powerpc     n          n
> > >>> powerpc64   n          n
> > >>>
> > >>>
> > >>> Reviewer Checklist:
> > >>> -------------------
> > >>> [Submitters: make sure that your review doesn't trigger any
> > >>> checkmarks!]
> > >>>
> > >>>
> > >>> Your checkin has not passed review because (see checked entries):
> > >>>
> > >>> ___ Your RR template is generally incomplete; it has too many
> > >>> blank
> > entries
> > >>>       that need proper data filled in.
> > >>>
> > >>> ___ You have failed to nominate the proper persons for review and
> > push.
> > >>>
> > >>> ___ Your patches do not have proper short+long header
> > >>>
> > >>> ___ You have grammar/spelling in your header that is unacceptable.
> > >>>
> > >>> ___ You have exceeded a sensible line length in your
> > >> headers/comments/text.
> > >>> ___ You have failed to put in a proper Trac Ticket # into your commits.
> > >>>
> > >>> ___ You have incorrectly put/left internal data in your comments/files
> > >>>       (i.e. internal bug tracking tool IDs, product names etc)
> > >>>
> > >>> ___ You have not given any evidence of testing beyond basic build
> tests.
> > >>>       Demonstrate some level of runtime or other sanity testing.
> > >>>
> > >>> ___ You have ^M present in some of your files. These have to be
> > removed.
> > >>>
> > >>> ___ You have needlessly changed whitespace or added whitespace
> > crimes
> > >>>       like trailing spaces, or spaces before tabs.
> > >>>
> > >>> ___ You have mixed real technical changes with whitespace and other
> > >>>       cosmetic code cleanup changes. These have to be separate
> commits.
> > >>>
> > >>> ___ You need to refactor your submission into logical chunks; there is
> > >>>       too much content into a single commit.
> > >>>
> > >>> ___ You have extraneous garbage in your review (merge commits etc)
> > >>>
> > >>> ___ You have giant attachments which should never have been sent;
> > >>>       Instead you should place your content in a public tree to be 
> > >>> pulled.
> > >>>
> > >>> ___ You have too many commits attached to an e-mail; resend as
> > threaded
> > >>>       commits, or place in a public tree for a pull.
> > >>>
> > >>> ___ You have resent this content multiple times without a clear
> > indication
> > >>>       of what has changed between each re-send.
> > >>>
> > >>> ___ You have failed to adequately and individually address all of the
> > >>>       comments and change requests that were proposed in the
> > >>> initial
> > review.
> > >>>
> > >>> ___ You have a misconfigured ~/.hgrc file (i.e. username, email
> > >>> etc)
> > >>>
> > >>> ___ Your computer have a badly configured date and time; confusing
> > the
> > >>>       the threaded patch review.
> > >>>
> > >>> ___ Your changes affect IPC mechanism, and you don't present any
> > results
> > >>>       for in-service upgradability test.
> > >>>
> > >>> ___ Your changes affect user manual and documentation, your patch
> > series
> > >>>       do not contain the patch that updates the Doxygen manual.
> > >>>
> > >>>
> > >>> ------------------------------------------------------------------
> > >>> --
> > >>> --
> > >>> -------- Get 100% visibility into Java/.NET code with AppDynamics
> > >>> Lite!
> > >>> It's a free troubleshooting tool designed for production.
> > >>> Get down to code-level detail for bottlenecks, with <2% overhead.
> > >>> Download for free and get started troubleshooting in minutes.
> > >>>
> > http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg
> > >>> .c lktrk _______________________________________________
> > >>> Opensaf-devel mailing list
> > >>> Opensaf-devel@lists.sourceforge.net
> > >>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
> > >>>
> > >>>
> > >>
> > >> -------------------------------------------------------------------
> > >> --
> > >> --------- Get 100% visibility into Java/.NET code with AppDynamics
> > >> Lite!
> > >> It's a free troubleshooting tool designed for production.
> > >> Get down to code-level detail for bottlenecks, with <2% overhead.
> > >> Download for free and get started troubleshooting in minutes.
> > >>
> > http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.
> > >> clk
> > >> trk
> > >> _______________________________________________
> > >> Opensaf-devel mailing list
> > >> Opensaf-devel@lists.sourceforge.net
> > >> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
> 
> 
> ------------------------------------------------------------------------------
> Get 100% visibility into Java/.NET code with AppDynamics Lite!
> It's a free troubleshooting tool designed for production.
> Get down to code-level detail for bottlenecks, with <2% overhead.
> Download for free and get started troubleshooting in minutes.
> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clk
> trk
> _______________________________________________
> Opensaf-devel mailing list
> Opensaf-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/opensaf-devel

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to