I will implement this as a separate check with a printout if SA_AIS_ERR_TIMEOUT
/Lennart > -----Original Message----- > From: Lennart Lund [mailto:lennart.l...@ericsson.com] > Sent: den 16 augusti 2013 14:14 > To: Anders Widell > Cc: opensaf-devel@lists.sourceforge.net > Subject: Re: [devel] [PATCH 0 of 3] Review Request for logsv: Fix hanging > main thread when file i/o dont return [#9] > > Hi, > > Is it ok to just add SA_AIS_ERR_TIMEOUT to the try again loop (poll_retry)? > > Existing code: > if (cb_error == SA_AIS_ERR_TRY_AGAIN) { > usleep(100000); /* 100 ms */ > try_agains++; > goto retry; > } > > Changed code: > retry: > if ((cb_error == SA_AIS_ERR_TRY_AGAIN) || (errorCode == > SA_AIS_ERR_TIMEOUT)) { > usleep(100000); /* 100 ms */ > try_agains++; > goto retry; > } > > Thanks' > Lennart > > > -----Original Message----- > > From: Anders Widell > > Sent: den 16 augusti 2013 12:25 > > To: Lennart Lund > > Cc: opensaf-devel@lists.sourceforge.net > > Subject: Re: [devel] [PATCH 0 of 3] Review Request for logsv: Fix > > hanging main thread when file i/o dont return [#9] > > > > One more comment: > > > > * saf_logger.c needs to be updated to handle the error code > > SA_AIS_ERR_TIMEOUT > > > > regards, > > Anders Widell > > > > 2013-08-16 12:16, Lennart Lund skrev: > > > Hi, > > > > > > See my comments to Anders W below > > > > > > Thanks' > > > Lennart > > > > > >> -----Original Message----- > > >> From: Anders Widell [mailto:anders.wid...@ericsson.com] > > >> Sent: den 16 augusti 2013 10:47 > > >> To: opensaf-devel@lists.sourceforge.net > > >> Subject: Re: [devel] [PATCH 0 of 3] Review Request for logsv: Fix > > >> hanging main thread when file i/o dont return [#9] > > >> > > >> Hi! > > >> > > >> I have some comments: > > >> > > >> * Time-out limits shall be configurable (according to our new > > >> policy on > > >> limits) > > > Already done. See patch, part 12 > > > > > >> * I think the use code for killing and re-starting the slave thread > > >> is overkill and should be removed. Unless we know (and have seen) > > >> that this solves a real problem that can happen in practice. > > > Ok, I will remove it > > > > > >> regards, > > >> Anders Widell > > >> > > >> 2013-08-09 15:20, Lennart Lund skrev: > > >>> Summary: logsv: Fix hanging main thread when file i/o don't return > > >>> Review request for Trac Ticket(s): #9 Peer Reviewer(s): Madhurika > > >>> Koppula, (Anders Widell, Hans Feldt) Pull request to: NA Affected > > >>> branch(es): devel (4.4) Development branch: <<IF ANY GIVE THE REPO > > >>> URL>> > > >>> > > >>> > > >>> -------------------------------- > > >>> Impacted area Impact y/n > > >>> -------------------------------- > > >>> Docs n > > >>> Build system n > > >>> RPM/packaging n > > >>> Configuration files n > > >>> Startup scripts n > > >>> SAF services y > > >>> OpenSAF services n > > >>> Core libraries n > > >>> Samples n > > >>> Tests n > > >>> Other n > > >>> > > >>> > > >>> Comments (indicate scope for each "y" above): > > >>> --------------------------------------------- > > >>> In order to protect the log server "main thread" (MT) from hanging > > >>> if a file operation like write, mkdir etc. does not return, all > > >>> such operations are done in a separate "file thread" (FT). > > >>> Functions running in the "Main Thread" (MT) that needs file system > > >>> operations handle over the execution to the FT when file handling > > >>> has to be done. Execution is then given back to the MT again. If a > > >>> file operation does not return FT will hang but MT will time out > > >>> the FT and > > >> resume. A timeout will be handled as a file operation fail. > > >>> The MT can detect if the FT is hanging and new requests for file > > >>> operations > > >> will be "failed". > > >>> Note1: This is an add on to the patches sent out in prevoius > > >>> review > > >> requests. > > >>> Note2: The last patch (part 11); The non block handling of log > > >>> files that was > > >> suggested by Madhurika > > >>> is contained in its' own patch. > > >>> > > >>> > > >>> changeset 43a3e4173f05a01aa595b3d770a1464a3338f32e > > >>> Author: Lennart Lund <lennart.l...@ericsson.com> > > >>> Date: Fri, 09 Aug 2013 13:46:36 +0200 > > >>> > > >>> logsv: Fix hanging main thread when file i/o don't return. [#9]. > > >>> Part > > >>> 9 > > >>> > > >>> - Remove unnecessary data copying in log_file_api() and > > >> file_hndl_thread() > > >>> - Return SA_AIS_ERR_TIMEOUT if the write operation time out when > > >> a log > > >>> record shall be written. If the file thread is already > > >>> "hanging" when a > > >>> write is requested no attempt to write is made and > > >> SA_AIS_ERR_TRY_AGAIN is > > >>> returned as before. > > >>> - Try to recover file thread by recreating it if it hangs for a > > >>> long time. > > >>> - Recover if bad file descriptor or stale NFS handle. > > >>> > > >>> changeset d3d78dc3ad87e083e411e0ce5436bcca511d54d6 > > >>> Author: Lennart Lund <lennart.l...@ericsson.com> > > >>> Date: Fri, 09 Aug 2013 13:46:36 +0200 > > >>> > > >>> logsv: Fix hanging main thread when file i/o don't return. [#9]. > > >>> Part > > >>> 10 > > >>> > > >>> - Always reinitialize/reopen log files if a write operation > > >>> fails, timeout > > >>> of file thread (hanging file system) included. > > >>> - Handle synchronization between nodes when log files cannot be > > >> created before > > >>> a switch over without using any new flag that has to be > > >>> checkpointed > > >>> (remove "files_initialized" flag) > > >>> - Incorrect handling of "partial write" is fixed. See #536 > > >>> > > >>> changeset b8a2060ff5fd6d2685f95757d422addeca1ebdb0 > > >>> Author: Lennart Lund <lennart.l...@ericsson.com> > > >>> Date: Fri, 09 Aug 2013 13:46:36 +0200 > > >>> > > >>> logsv: Fix hanging main thread when file i/o don't return. [#9]. > > >>> Part > > >>> 11 > > >>> > > >>> - Open log files with O_NONBLOCK. Answer client with > > >> AIS_ERR_TIMEOUT if > > >>> EWOULDBLOCK/EAGAIN (record may be parially written) > > >>> > > >>> > > >>> Complete diffstat: > > >>> ------------------ > > >>> osaf/services/saf/logsv/lgs/lgs_evt.c | 11 ++-- > > >>> osaf/services/saf/logsv/lgs/lgs_file.c | 197 > > >> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > >> +++++------------------------- > > >>> osaf/services/saf/logsv/lgs/lgs_filehdl.c | 2 +- > > >>> osaf/services/saf/logsv/lgs/lgs_imm.c | 1 - > > >>> osaf/services/saf/logsv/lgs/lgs_mbcsv.c | 7 --- > > >>> osaf/services/saf/logsv/lgs/lgs_mbcsv.h | 3 - > > >>> osaf/services/saf/logsv/lgs/lgs_stream.c | 115 > > >> +++++++++++++++++++++------------------------------ > > >>> osaf/services/saf/logsv/lgs/lgs_stream.h | 1 - > > >>> 8 files changed, 196 insertions(+), 141 deletions(-) > > >>> > > >>> > > >>> Testing Commands: > > >>> ----------------- > > >>> See previous review > > >>> > > >>> > > >>> Testing, Expected Results: > > >>> -------------------------- > > >>> <<PASTE COMMAND OUTPUTS / TEST RESULTS>> > > >>> > > >>> > > >>> Conditions of Submission: > > >>> ------------------------- > > >>> <<HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC>> > > >>> > > >>> > > >>> Arch Built Started Linux distro > > >>> ------------------------------------------- > > >>> mips n n > > >>> mips64 n n > > >>> x86 n n > > >>> x86_64 n n > > >>> powerpc n n > > >>> powerpc64 n n > > >>> > > >>> > > >>> Reviewer Checklist: > > >>> ------------------- > > >>> [Submitters: make sure that your review doesn't trigger any > > >>> checkmarks!] > > >>> > > >>> > > >>> Your checkin has not passed review because (see checked entries): > > >>> > > >>> ___ Your RR template is generally incomplete; it has too many > > >>> blank > > entries > > >>> that need proper data filled in. > > >>> > > >>> ___ You have failed to nominate the proper persons for review and > > push. > > >>> > > >>> ___ Your patches do not have proper short+long header > > >>> > > >>> ___ You have grammar/spelling in your header that is unacceptable. > > >>> > > >>> ___ You have exceeded a sensible line length in your > > >> headers/comments/text. > > >>> ___ You have failed to put in a proper Trac Ticket # into your commits. > > >>> > > >>> ___ You have incorrectly put/left internal data in your comments/files > > >>> (i.e. internal bug tracking tool IDs, product names etc) > > >>> > > >>> ___ You have not given any evidence of testing beyond basic build > tests. > > >>> Demonstrate some level of runtime or other sanity testing. > > >>> > > >>> ___ You have ^M present in some of your files. These have to be > > removed. > > >>> > > >>> ___ You have needlessly changed whitespace or added whitespace > > crimes > > >>> like trailing spaces, or spaces before tabs. > > >>> > > >>> ___ You have mixed real technical changes with whitespace and other > > >>> cosmetic code cleanup changes. These have to be separate > commits. > > >>> > > >>> ___ You need to refactor your submission into logical chunks; there is > > >>> too much content into a single commit. > > >>> > > >>> ___ You have extraneous garbage in your review (merge commits etc) > > >>> > > >>> ___ You have giant attachments which should never have been sent; > > >>> Instead you should place your content in a public tree to be > > >>> pulled. > > >>> > > >>> ___ You have too many commits attached to an e-mail; resend as > > threaded > > >>> commits, or place in a public tree for a pull. > > >>> > > >>> ___ You have resent this content multiple times without a clear > > indication > > >>> of what has changed between each re-send. > > >>> > > >>> ___ You have failed to adequately and individually address all of the > > >>> comments and change requests that were proposed in the > > >>> initial > > review. > > >>> > > >>> ___ You have a misconfigured ~/.hgrc file (i.e. username, email > > >>> etc) > > >>> > > >>> ___ Your computer have a badly configured date and time; confusing > > the > > >>> the threaded patch review. > > >>> > > >>> ___ Your changes affect IPC mechanism, and you don't present any > > results > > >>> for in-service upgradability test. > > >>> > > >>> ___ Your changes affect user manual and documentation, your patch > > series > > >>> do not contain the patch that updates the Doxygen manual. > > >>> > > >>> > > >>> ------------------------------------------------------------------ > > >>> -- > > >>> -- > > >>> -------- Get 100% visibility into Java/.NET code with AppDynamics > > >>> Lite! > > >>> It's a free troubleshooting tool designed for production. > > >>> Get down to code-level detail for bottlenecks, with <2% overhead. > > >>> Download for free and get started troubleshooting in minutes. > > >>> > > http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg > > >>> .c lktrk _______________________________________________ > > >>> Opensaf-devel mailing list > > >>> Opensaf-devel@lists.sourceforge.net > > >>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel > > >>> > > >>> > > >> > > >> ------------------------------------------------------------------- > > >> -- > > >> --------- Get 100% visibility into Java/.NET code with AppDynamics > > >> Lite! > > >> It's a free troubleshooting tool designed for production. > > >> Get down to code-level detail for bottlenecks, with <2% overhead. > > >> Download for free and get started troubleshooting in minutes. > > >> > > http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg. > > >> clk > > >> trk > > >> _______________________________________________ > > >> Opensaf-devel mailing list > > >> Opensaf-devel@lists.sourceforge.net > > >> https://lists.sourceforge.net/lists/listinfo/opensaf-devel > > > ------------------------------------------------------------------------------ > Get 100% visibility into Java/.NET code with AppDynamics Lite! > It's a free troubleshooting tool designed for production. > Get down to code-level detail for bottlenecks, with <2% overhead. > Download for free and get started troubleshooting in minutes. > http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clk > trk > _______________________________________________ > Opensaf-devel mailing list > Opensaf-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/opensaf-devel ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel