Fixing the e-poll problem...

2009-06-16 Thread Emmanuel Lecharny

Hi guys,

we have a set of JIRA refering to a well known bug in Java 5-6-7 (up to 
b55 for Java 7). Basically, there is a nasty bug in the select() method. 
The issue is described in 
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6403933 :


 This is an issue with poll (and epoll) on Linux. If a file descriptor 
for a connected socket is polled with a request event mask of 0, and if 
the connection is abruptly terminated (RST) then the poll wakes up with 
the POLLHUP (and maybe POLLERR) bit set in the returned event set. The 
implication of this behaviour is that Selector will wakeup and as the 
interest set for the SocketChannel is 0 it means there aren't any 
selected events and the select method returns 0.


I have baked a small patch against this problem. The idea is to check if 
select( timeout ) returns too quickly. It would have been easier if only 
we have used select() in the IoProcessor, but sadly, we use this timeout 
to allow the detection of idle sessions to be done in this loop (a major 
mistake, IMO). However...


Here is the proposed solution :

   for(;;) {
   long t0 = System.currentTimeMillis();
   int selected = select(SELECT_TIMEOUT);

   long t1 = System.currentTimeMillis();
  
   if (selected == 0) {

   if ((t1 - t0)  100) {
   // Switch the selectors
   registerNewSelector();
   }
   }

   // process the selected keys now ...


Ok, so far so good, but it's not enough. One other reason we might get 
out the select( SELECT_TIMEOUT) is some other thread called 
selector.wakeup(). We have to deal with that. I have added a flag set to 
false by default and flipped by the wakeup() method in order to be sure 
that we are hitting the NIO bug. The code looks like :


   for (;;) {
   try {
   long t0 = System.currentTimeMillis();
   int selected = select(SELECT_TIMEOUT);

   synchronized(wakeupCalled) {
   long t1 = System.currentTimeMillis();
  
   if (selected == 0) {

   if ( ! wakeupCalled.get()) {
   if ((t1 - t0)  100) {
   registerNewSelector();
   }
   }
   }
  
   wakeupCalled.getAndSet(false);

   }

   nSessions += handleNewSessions();

and in the wakeup() method :

   protected void wakeup() {
   synchronized(wakeupCalled) {
   wakeupCalled.getAndSet(true);
   selector.wakeup();
   }
   }

I have created a branch (select-fix) for that. Please test it and give 
me some feedback !


Thanks !

--
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org




Re: Fixing the e-poll problem...

2009-06-16 Thread Martin Jordan
On Tuesday 16 June 2009 11:54:02 Emmanuel Lecharny wrote:
 Hi guys,

 we have a set of JIRA refering to a well known bug in Java 5-6-7 (up to
 b55 for Java 7). Basically, there is a nasty bug in the select() method.
 The issue is described in
 http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6403933 :

[...]

 I have created a branch (select-fix) for that. Please test it and give
 me some feedback !

 Thanks !

Hi Emmanuel,

first of all thanks for your effort trying to workaround the EPoll problem.

After a short test with multiple and simultaneous random input to MINA 
(connect, write on socket, disconnect), I got the following exceptions:

http://fanti.staff.spin.de/m7_log2.txt

Every exception is thrown in an interval of one second. 

When this error occours it seems that MINA does not accept new connections nor 
reads from existing ones (setup 1 NioSocketAcceptor with 1 NioProcessor).



-- 
Martin Jordan, SPiN AG
fa...@spin.de
http://www.spin-ag.de

SPiN AG, Bischof-von-Henle-Str. 2b
93051 Regensburg, HRB 6295 Regensburg
Aufsichtsratsvors.: Dr. Christian Kirnberger
Vorstaende: Fabian Rott, Paul Schmid