Geir, Tanks for your reply! >In code that calls the wrapper for the lowest-level select(), right? Yes
pollSelectRead() // loop is here hysock_select_read() // return code is propagated hysock_select() // EINTR is renamed to HYPORT_ERROR_SOCKET_INTERRUPTED >I don't understand this - what do you see as the real problem? >Interruption from select due to signals is a fact of life under linux. This is a good question. 100% agree with your statement. I will try to reproduce connection failures on Linux right after overcoming my SuSE Linux problems with running DRLVM. If there are no connection failures, than there is no problem. With best regards, Alexei Fedotov, Intel Java & XML Engineering >-----Original Message----- >From: Geir Magnusson Jr. [mailto:[EMAIL PROTECTED] >Sent: Tuesday, October 31, 2006 2:30 AM >To: harmony-dev@incubator.apache.org >Subject: Re: [classlib][luni] signalis interruptus in hysock > > >Fedotov, Alexei A wrote: >> Geir, All, >> >> I have examined class library code. It seems that the solution we >> invented (return EINTR, then loop) was always in place. :-) >> >> Few comments on understanding: >> 1. EINTR (=4) is renamed to HYPORT_ERROR_SOCKET_INTERRUPTED (=-9). > >Yes, I did that in one place to have it fit into the portlib error code >set. Someone may have done it in another. > >> 2. The loop is coded by means of "goto select". > >In code that calls the wrapper for the lowest-level select(), right? > >> 3. The same pattern is dupdupduplicated several times. > >That's another issue entirely :) > >> >> I have not examined all places, though there could be paths which do not >> fit the pattern. Honestly, I have examined the only path: >> >> pollSelectRead() -> >> hysock_select_read() -> >> hysock_select() >> >> Summary: >> We can keep this issue open or close it as won't fix. Meanwhile we >> should look for the real problem. > >I don't understand this - what do you see as the real problem? >Interruption from select due to signals is a fact of life under linux. > >geir > >> >> With best regards, >> Alexei Fedotov, >> Intel Java & XML Engineering >> >>> -----Original Message----- >>> From: Geir Magnusson Jr. [mailto:[EMAIL PROTECTED] >>> Sent: Thursday, October 26, 2006 6:21 PM >>> To: harmony-dev@incubator.apache.org >>> Subject: Re: [classlib][luni] signalis interruptus in hysock >>> >>> >>> >>> Fedotov, Alexei A wrote: >>>> Geir, >>>> >>>> Do I understand correctly that you suggest the following? >>>> >>>> 1. hysock_select as its name says should mimic a behavior of select, >> i. >>>> e. return the error code from select without changing it. It's ok to >>>> print a rare debug message. >>> Yes, that's what I had the other do (and no, I see no reason to print a >>> debug message, as upper layers can print if they find an EINTR) >>> >>>> 2. The correct place for the loop is the module where hysock_select >> is >>>> called, or, let me be precise, class lib guys are to fix our >> networking >>>> code. >>> My plan is to fix it as fixed the other one. It turns out that there >>> are several layers between java and the OS... >>> >>> geir >>> >>>> >>>> With best regards, >>>> Alexei Fedotov, >>>> Intel Java & XML Engineering >>>> >>>>> -----Original Message----- >>>>> From: Geir Magnusson Jr. [mailto:[EMAIL PROTECTED] >>>>> Sent: Wednesday, October 25, 2006 10:01 AM >>>>> To: harmony-dev@incubator.apache.org >>>>> Subject: Re: [classlib][luni] signalis interruptus in hysock >>>>> >>>>> >>>>> >>>>> Weldon Washburn wrote: >>>>>> It seems JIRA is down for maintenance. If HARMONY-1904 is still >> open >>>>>> perhaps it makes sense to put a counter in the while (...) { >>>> select...} >>>>>> loop. And after every N loops, print a warning/diagnostic message. >>>>> For whom and to what end? Why not just return EINTR (in hysock >> speak)? >>>>>> The >>>>>> value for N would have to be tuned. I don't know what the best >>>> number >>>>>> would >>>>>> be. Given that 1904 patch is not the final solution, at least a >>>>> diagnostic >>>>>> that hints at where the system hangs would be useful. It might >> make >>>>> sense >>>>>> to even print a stack trace. Also, I agree with Ivan below. >>>> Signals >>>>> bugs >>>>>> are very hard to debug. And diagnostics can help us all understand >>>> the >>>>>> corner cases better. >>>>> But so far, no one has shown that the system hangs, or can hang, >> simply >>>>> because we return EINTR.... >>>>> >>>>> geir >>>>> >>>>>> On 10/20/06, Ivan Volosyuk <[EMAIL PROTECTED]> wrote: >>>>>>> On 10/20/06, Geir Magnusson Jr. <[EMAIL PROTECTED]> wrote: >>>>>>>> Ivan Volosyuk wrote: >>>>>>>>> Well, I think that the solution is what Geir suggests. One think >>>>>>> which >>>>>>>>> bothers me is following. EINTR can happen in different places >>>> and >>>>> the >>>>>>>>> situations can be quite rare in some circumstances. It can lead >>>> to >>>>>>>>> hard to reproduce stability bugs (race conditions). >>>>>>>> Can you give an example? >>>>>>> Half a year ago, I was working on the problem. Socket operations >> get >>>>>>> sometimes interrupted. We have found out that it occurs sometime >>>> after >>>>>>> GC. It was not quite easy as the application was quite big and >>>>>>> situation - quite rare. >>>>>>> >>>>>>> Given the fact, that current implementation of monitor reservation >>>>>>> code can stop other thread in quite random fashion we should have >>>> rock >>>>>>> solid support of EINTR handling everywhere the select(), poll() >>>> calls >>>>>>> is used. >>>>>>> >>>>>>> -- >>>>>>> Ivan >>>>>>> Intel Enterprise Solutions Software Division >>>>>>> >>>>>>>>> We should find a >>>>>>>>> way how to test the implementation. >>>>>>>> +1! >>>>>>>> >>>>>>>> :) >>>>>>>> >>>>>>>> geir >>>> --------------------------------------------------------------------- >>>>>>> Terms of use : http://incubator.apache.org/harmony/mailing.html >>>>>>> To unsubscribe, e-mail: >> [EMAIL PROTECTED] >>>>>>> For additional commands, e-mail: >>>> [EMAIL PROTECTED] >>