Re: HttpCore NIO hurt by JDK bug?

Oleg Kalnichevski Wed, 14 Jul 2010 01:45:28 -0700

On Tue, 2010-07-13 at 13:32 -0700, Harold Lee wrote:
> Regarding this JDK bug:
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6403933
> 
> I think we are experiencing this using HttpCore on Linux with Java
> 1.6. We wind up leaking socket descriptors until the JVM process runs
> out. We also wind up having to start a new reactor thread, which
> creates a new Selector. The old reactor thread keeps running and the
> thread dump shows it in sun.nio.ch.EPollArrayWrapper.epollWait as
> reported by others in the bug report above.
>


Folks

Anyone experienced anything like that? The looks pretty old, but there
has been no reports of similar problems with HttpCore NIO. I am using
Linux / JDK 1.6 on a daily basis when hacking on HttpCore but I have not
encountered such a problem yet.
  

> Here's the change that the Glassfish team made to work around this JDK bug:
> 
> http://fisheye5.cenqua.com/browse/glassfish/appserv-http-engine/src/java/com/sun/enterprise/web/connector/grizzly/ByteBufferInputStream.java?r1=1.8&r2=1.9
> 
> From my reading, the Glassfish code is much simpler than the HttpCore
> NIO code: they're registering interest for just 1 socket and using
> Selector.select() to wait for data from that socket. For HttpCore NIO,
> it isn't yet clear to me how we can detect which selector is "trashed"
> in order to cancel it and recreate it.
>
> I'm working on a workaround in AbstractMultiworkerIOReactor.java. If
> selector.select returns 0 (setting readyCount to 0) then we don't know
> whether this bug hit us or we just had a timeout.

The problem is that it is perfectly valid for a selector to return 0
ready count. This condition alone is not sufficient to assume the
selector is trashed.


>  To be safe, I think
> we need to close every registered SelectorKey and then call
> selector.selectNow() to flush them. Then we can create a new
> SelectorKey for each and reregister them. The only way to make it less
> common, I think, is to use a long selectTimeout value so that the odds
> of a timeout are low. Ugly, but I hope it will work.
> 

This will unfortunately screw up handling of new / closed channels as
well timeout logic.

The work-around looks butt ugly and would require tons of fairly complex
code. Is there a way to reproduce the issue with a test scenario, so we
could look for alternative approaches?

Cheers

Oleg


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: HttpCore NIO hurt by JDK bug?

Reply via email to