I've had a patch for this soak testing for some time on our problematic test server. It's still leaking a few handles, but it's not crashed in a week or so now which seems to be a substantial improvement so I'm going to take this as a confirmation that I've at least fixed *something*.
I couldn't quite get it down to a single NioSocketConnector as connection-timeout is a property of the connector, not the connection, so it now pools Connectors per timeout value and disposes them with a reference count to active connections. I filed bug NMS-4846 with details and attached the patch there. Thanks, Duncan Mackintosh (dijm) ________________________________________ From: Duncan Mackintosh [dmackint...@cbnl.com] Sent: 06 July 2011 13:20 To: opennms-devel@lists.sourceforge.net Subject: [opennms-devel] Provisiond, "Too many open files" and Mina I've been doing a lot of digging around various 'Too many open files' crashes we've been seeing locally, and I think I've pinned down a big leak of file descriptors in provisiond's use of org.apache.mina connectors. What it's currently doing in AsyncBasicDetector#isServiceDetected: - For each service, create a new NioSocketConnector - Configure that connector with a handler, filters etc - Make a connection out, check for results etc There seem to be two problems with this approach: 1) Constructing an NioSocketConnector creates a lot of 'anon_inode' and 'pipe' file descriptors - on one machine it was 8 & 12 respectiovely and on another 4/8, so I'm not sure quite what the difference is there (under linux, at least; I assume some equivalent under Windows). The actual connect() call only uses one more handle. This causes it to run out of descriptors a lot faster than expected. 2) If new NioSocketConnector() crashes due to a "Too many open files" exception, Mina sometimes just sort of falls over dead with "NoClassDefFoundError: Could not initialize class sun.nio.ch.FileDispatcher". This class does exist in my JVM (openjdk 6) and if I reflectively inspect it first, it sometimes stops the crashes happening. I'm pretty baffled there, to be honest. If it does get itself into this state, you can't close existing sockets, you can't open new ones; all the anon_inode and pipe FDs just sit there. This seems to tally with behaviour we've witnessed in opennms instances where we've had a Too many open files crash - lsof shows a few thousand pipe/anon_inode handles just sitting around long after the crash. For reference, I've attached a simple test class that just opens ~60 connections using the current methodology. If you lsof the process while it pauses, you can see how many new file descriptors are being created each time; if you drop the 60 down to 50 it cleans up gracefully but at 60 it doesn't seem possible to free the descriptors again (you'll need mina-core and slf4j-log4j12 in a project to run it). I'd be quite interested to see if others get the same behaviour I do. What I think Mina wants you to be doing is creating a single NioSocketConnector to reuse everywhere and using the optional IoSessionInitializer in .connect() to configure filters and attach state objects to the IoSession. This would take a moderate overhaul of AsyncBasicDetector, as the handler would need to be rewritten to be a singleton that takes some state using IoSession.get/setAttribute rather than having one handler per service detect attempt and probably a fair chunk of refactoring at the same time. Before I embark on making those changes I wanted to throw this out there for comment, and to see if there's already a refactor of this code planned (I couldn't see any changes on the provisiond-refactor branch yet). Thanks, Duncan Mackintosh (dijm) Cambridge Broadband Networks Limited Registered in England and Wales under company number: 03879840 Registered office: Selwyn House, Cambridge Business Park, Cowley Road, Cambridge CB4 0WZ, UK. VAT number: GB 741 0186 64 Cambridge Broadband Networks Limited Registered in England and Wales under company number: 03879840 Registered office: Selwyn House, Cambridge Business Park, Cowley Road, Cambridge CB4 0WZ, UK. VAT number: GB 741 0186 64 ------------------------------------------------------------------------------ Storage Efficiency Calculator This modeling tool is based on patent-pending intellectual property that has been used successfully in hundreds of IBM storage optimization engage- ments, worldwide. Store less, Store more with what you own, Move data to the right place. Try It Now! http://www.accelacomm.com/jaw/sfnl/114/51427378/ _______________________________________________ Please read the OpenNMS Mailing List FAQ: http://www.opennms.org/index.php/Mailing_List_FAQ opennms-devel mailing list To *unsubscribe* or change your subscription options, see the bottom of this page: https://lists.sourceforge.net/lists/listinfo/opennms-devel