Re: HttpCore NIO hurt by JDK bug?

Harold Lee Thu, 15 Jul 2010 13:08:01 -0700

I've put together a simple HTTP server that resets the connection
after sending part of the response back to the client. I'm going to
try to recreate the bug (leaking sockets) by making many requests
against that server from a Linux box. I'll let you know what I find.


Harold

On Wed, Jul 14, 2010 at 1:44 AM, Oleg Kalnichevski <[email protected]> wrote:
> On Tue, 2010-07-13 at 13:32 -0700, Harold Lee wrote:
>> Regarding this JDK bug:
>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6403933
>>
>> I think we are experiencing this using HttpCore on Linux with Java
>> 1.6. We wind up leaking socket descriptors until the JVM process runs
>> out. We also wind up having to start a new reactor thread, which
>> creates a new Selector. The old reactor thread keeps running and the
>> thread dump shows it in sun.nio.ch.EPollArrayWrapper.epollWait as
>> reported by others in the bug report above.
>>
>
> Folks
>
> Anyone experienced anything like that? The looks pretty old, but there
> has been no reports of similar problems with HttpCore NIO. I am using
> Linux / JDK 1.6 on a daily basis when hacking on HttpCore but I have not
> encountered such a problem yet.
>
>
>> Here's the change that the Glassfish team made to work around this JDK bug:
>>
>> http://fisheye5.cenqua.com/browse/glassfish/appserv-http-engine/src/java/com/sun/enterprise/web/connector/grizzly/ByteBufferInputStream.java?r1=1.8&r2=1.9
>>
>> From my reading, the Glassfish code is much simpler than the HttpCore
>> NIO code: they're registering interest for just 1 socket and using
>> Selector.select() to wait for data from that socket. For HttpCore NIO,
>> it isn't yet clear to me how we can detect which selector is "trashed"
>> in order to cancel it and recreate it.
>>
>> I'm working on a workaround in AbstractMultiworkerIOReactor.java. If
>> selector.select returns 0 (setting readyCount to 0) then we don't know
>> whether this bug hit us or we just had a timeout.
>
> The problem is that it is perfectly valid for a selector to return 0
> ready count. This condition alone is not sufficient to assume the
> selector is trashed.
>
>
>>  To be safe, I think
>> we need to close every registered SelectorKey and then call
>> selector.selectNow() to flush them. Then we can create a new
>> SelectorKey for each and reregister them. The only way to make it less
>> common, I think, is to use a long selectTimeout value so that the odds
>> of a timeout are low. Ugly, but I hope it will work.
>>
>
> This will unfortunately screw up handling of new / closed channels as
> well timeout logic.
>
> The work-around looks butt ugly and would require tons of fairly complex
> code. Is there a way to reproduce the issue with a test scenario, so we
> could look for alternative approaches?
>
> Cheers
>
> Oleg
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: HttpCore NIO hurt by JDK bug?

Reply via email to