I've been doing a lot of digging around various 'Too many open files' crashes
we've been seeing locally, and I think I've pinned down a big leak of file
descriptors in provisiond's use of org.apache.mina connectors.
What it's currently doing in AsyncBasicDetector#isServiceDetected:
- For each service, create a new NioSocketConnector
- Configure that connector with a handler, filters etc
- Make a connection out, check for results etc
There seem to be two problems with this approach:
1) Constructing an NioSocketConnector creates a lot of 'anon_inode' and 'pipe'
file descriptors - on one machine it was 8 & 12 respectiovely and on another
4/8, so I'm not sure quite what the difference is there (under linux, at least;
I assume some equivalent under Windows). The actual connect() call only uses
one more handle. This causes it to run out of descriptors a lot faster than
expected.
2) If new NioSocketConnector() crashes due to a "Too many open files"
exception, Mina sometimes just sort of falls over dead with
"NoClassDefFoundError: Could not initialize class sun.nio.ch.FileDispatcher".
This class does exist in my JVM (openjdk 6) and if I reflectively inspect it
first, it sometimes stops the crashes happening. I'm pretty baffled there, to
be honest. If it does get itself into this state, you can't close existing
sockets, you can't open new ones; all the anon_inode and pipe FDs just sit
there. This seems to tally with behaviour we've witnessed in opennms instances
where we've had a Too many open files crash - lsof shows a few thousand
pipe/anon_inode handles just sitting around long after the crash.
For reference, I've attached a simple test class that just opens ~60
connections using the current methodology. If you lsof the process while it
pauses, you can see how many new file descriptors are being created each time;
if you drop the 60 down to 50 it cleans up gracefully but at 60 it doesn't seem
possible to free the descriptors again (you'll need mina-core and slf4j-log4j12
in a project to run it). I'd be quite interested to see if others get the same
behaviour I do.
What I think Mina wants you to be doing is creating a single NioSocketConnector
to reuse everywhere and using the optional IoSessionInitializer in .connect()
to configure filters and attach state objects to the IoSession. This would take
a moderate overhaul of AsyncBasicDetector, as the handler would need to be
rewritten to be a singleton that takes some state using
IoSession.get/setAttribute rather than having one handler per service detect
attempt and probably a fair chunk of refactoring at the same time.
Before I embark on making those changes I wanted to throw this out there for
comment, and to see if there's already a refactor of this code planned (I
couldn't see any changes on the provisiond-refactor branch yet).
Thanks,
Duncan Mackintosh (dijm)
Cambridge Broadband Networks Limited Registered in England and Wales under
company number: 03879840 Registered office: Selwyn House, Cambridge Business
Park, Cowley Road, Cambridge CB4 0WZ, UK. VAT number: GB 741 0186 64
import java.io.IOException;
import java.lang.reflect.Method;
import java.net.InetSocketAddress;
import java.net.SocketAddress;
import java.util.LinkedList;
import java.util.List;
import org.apache.log4j.BasicConfigurator;
import org.apache.mina.core.future.ConnectFuture;
import org.apache.mina.core.service.IoHandler;
import org.apache.mina.core.session.IdleStatus;
import org.apache.mina.core.session.IoSession;
import org.apache.mina.transport.socket.nio.NioSocketConnector;
public class MinaTest {
private static IoHandler handler=new IoHandler() {
public void sessionOpened(IoSession session) throws Exception {}
public void sessionIdle(IoSession session, IdleStatus status) throws Exception {}
public void sessionCreated(IoSession session) throws Exception {}
public void sessionClosed(IoSession session) throws Exception {}
public void messageSent(IoSession session, Object message) throws Exception {}
public void messageReceived(IoSession session, Object message) throws Exception {}
public void exceptionCaught(IoSession session, Throwable cause) throws Exception {}
};
public static void main(String[] args) {
// log4j setup
BasicConfigurator.configure();
// Uncomment this block to miraculously fix NoClassDefFoundErrors
// ...sometimes
/*
// Confirm that FileDispatcher#closeIntFD(int) actually exists
try {
Class<?> clazz = Class.forName("sun.nio.ch.FileDispatcher");
Method closeIntFd = clazz.getDeclaredMethod("closeIntFD", Integer.TYPE);
closeIntFd.setAccessible(true);
closeIntFd.invoke(null, -1);
System.out.println("Successfully invoked sun.nio.ch.FileDispatcher.closeIntFD(-1)");
} catch (Exception e1) {
e1.printStackTrace();
}
*/
// Open many connections until it crashes
SocketAddress dest=new InetSocketAddress("www.google.com", 80);
List<NioSocketConnector> connectors=new LinkedList<NioSocketConnector>();
List<ConnectFuture> connections=new LinkedList<ConnectFuture>();
System.out.println("Opening 100 connections to " + dest);
for(int i = 0; i < 100; i++) {
if(i%10 == 0) {
System.out.println("Opened " + i + " connections. Press enter.");
try { while(System.in.read()!='\n');} catch (IOException e) {}
}
try {
NioSocketConnector c = new NioSocketConnector();
c.setHandler(handler);
connections.add(c.connect(dest));
connectors.add(c);
}
catch(Exception e) {
System.out.println("Exception on connect attempt " + i + ": ");
e.printStackTrace();
break;
}
}
System.out.println("Added all connections. Press enter to try closing.");
try { while(System.in.read()!='\n');} catch (IOException e) {}
System.out.println("Trying to close all sessions");
for(ConnectFuture f : connections) {
f.awaitUninterruptibly().getSession().close(true);
}
try {Thread.sleep(1000);} catch(InterruptedException e) {}
System.out.println("Press enter to dispose connectors");
try { while(System.in.read()!='\n');} catch (IOException e) {}
System.out.println("Trying to dispose the connectors");
for(NioSocketConnector c : connectors) {
c.dispose();
}
try {Thread.sleep(1000);} catch(InterruptedException e) {}
System.out.println("Done. Press enter to exit");
try { while(System.in.read()!='\n');} catch (IOException e) {}
// Need a force exit because the IO threads don't die if
// everything's crashing
System.exit(0);
}
}
------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Please read the OpenNMS Mailing List FAQ:
http://www.opennms.org/index.php/Mailing_List_FAQ
opennms-devel mailing list
To *unsubscribe* or change your subscription options, see the bottom of this
page:
https://lists.sourceforge.net/lists/listinfo/opennms-devel