It looks like we can consistently reproduce this whenever there is a failed connection. There seems to be a file descriptor leak everytime there is a failed connection.
We registered a SessionRequestCallback and each time public void failed(final SessionRequest request) is called we seem to have a file descriptor leak. Perhaps there is a bug where the descriptor is not cleaned up properly if there is a failure connecting ? Hiranya Jayathilaka-3 wrote: > > On Tue, Nov 2, 2010 at 12:48 AM, swatkatz <[email protected]> wrote: >> >> Hello, >> >> We seem to be experiencing this as well when using NIO. We are using JDK >> 1.6 >> Update 21. > > This bug should be fixed in JDK 1.6 build 21. At least that's what all > the evidence suggest. We haven't been able to reproduce the issue on > this particular JDK version ever. > > Thanks, > Hiranya > > Any ideas what the workaround/fix is ? >> >> Regards, >> Mohan >> >> >> >> olegk wrote: >>> >>> On Thu, 2010-07-15 at 12:50 -0700, Harold Lee wrote: >>>> I've put together a simple HTTP server that resets the connection >>>> after sending part of the response back to the client. I'm going to >>>> try to recreate the bug (leaking sockets) by making many requests >>>> against that server from a Linux box. I'll let you know what I find. >>>> >>>> Harold >>>> >>> >>> >>> >>>> On Wed, Jul 14, 2010 at 1:44 AM, Oleg Kalnichevski <[email protected]> >>>> wrote: >>>> > On Tue, 2010-07-13 at 13:32 -0700, Harold Lee wrote: >>>> >> Regarding this JDK bug: >>>> >> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6403933 >>>> >> >>>> >> I think we are experiencing this using HttpCore on Linux with Java >>>> >> 1.6. We wind up leaking socket descriptors until the JVM process >>>> runs >>>> >> out. We also wind up having to start a new reactor thread, which >>>> >> creates a new Selector. The old reactor thread keeps running and the >>>> >> thread dump shows it in sun.nio.ch.EPollArrayWrapper.epollWait as >>>> >> reported by others in the bug report above. >>>> >> >>>> > >>> >>> >>> Hi Harold >>> >>> Did you have any luck reproducing the problem? >>> >>> I put together a work-around for the bug that causes the epoll spin >>> problem [1]. If you are interested in trying it out I will happily share >>> it with you. The work-around is pretty ugly, so I want to be sure there >>> is no other way of solving the issue. >>> >>> cheers >>> >>> Oleg >>> >>> [1] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6403933 >>> >>>> > Folks >>>> > >>>> > Anyone experienced anything like that? The looks pretty old, but >>>> there >>>> > has been no reports of similar problems with HttpCore NIO. I am using >>>> > Linux / JDK 1.6 on a daily basis when hacking on HttpCore but I have >>>> not >>>> > encountered such a problem yet. >>>> > >>>> > >>>> >> Here's the change that the Glassfish team made to work around this >>>> JDK >>>> bug: >>>> >> >>>> >> >>>> http://fisheye5.cenqua.com/browse/glassfish/appserv-http-engine/src/java/com/sun/enterprise/web/connector/grizzly/ByteBufferInputStream.java?r1=1.8&r2=1.9 >>>> >> >>>> >> From my reading, the Glassfish code is much simpler than the >>>> HttpCore >>>> >> NIO code: they're registering interest for just 1 socket and using >>>> >> Selector.select() to wait for data from that socket. For HttpCore >>>> NIO, >>>> >> it isn't yet clear to me how we can detect which selector is >>>> "trashed" >>>> >> in order to cancel it and recreate it. >>>> >> >>>> >> I'm working on a workaround in AbstractMultiworkerIOReactor.java. If >>>> >> selector.select returns 0 (setting readyCount to 0) then we don't >>>> know >>>> >> whether this bug hit us or we just had a timeout. >>>> > >>>> > The problem is that it is perfectly valid for a selector to return 0 >>>> > ready count. This condition alone is not sufficient to assume the >>>> > selector is trashed. >>>> > >>>> > >>>> >> To be safe, I think >>>> >> we need to close every registered SelectorKey and then call >>>> >> selector.selectNow() to flush them. Then we can create a new >>>> >> SelectorKey for each and reregister them. The only way to make it >>>> less >>>> >> common, I think, is to use a long selectTimeout value so that the >>>> odds >>>> >> of a timeout are low. Ugly, but I hope it will work. >>>> >> >>>> > >>>> > This will unfortunately screw up handling of new / closed channels as >>>> > well timeout logic. >>>> > >>>> > The work-around looks butt ugly and would require tons of fairly >>>> complex >>>> > code. Is there a way to reproduce the issue with a test scenario, so >>>> we >>>> > could look for alternative approaches? >>>> > >>>> > Cheers >>>> > >>>> > Oleg >>>> > >>>> > >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [email protected] >>>> For additional commands, e-mail: [email protected] >>>> >>>> >>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>> >>> >> >> -- >> View this message in context: >> http://old.nabble.com/HttpCore-NIO-hurt-by-JDK-bug--tp29155405p30107703.html >> Sent from the HttpComponents-Dev mailing list archive at Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> > > > > -- > Hiranya Jayathilaka > Senior Software Engineer; > WSO2 Inc.; http://wso2.org > E-mail: [email protected]; Mobile: +94 77 633 3491 > Blog: http://techfeast-hiranya.blogspot.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > > -- View this message in context: http://old.nabble.com/HttpCore-NIO-hurt-by-JDK-bug--tp29155405p30108164.html Sent from the HttpComponents-Dev mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
