Hi Martin, I believe you are using 4.1.0-RC4 with some custom changes you have done locally. Will you be able to test this on stratos-4.1.x branch latest commit (without having any other changes)? I cannot recall a fix we did after 4.1.0-RC4 for this but it would be better if you can verify with the latest code in stratos-4.1.x branch.
At the same time will you be able to do following: - Take a thread dump of the running Stratos, CEP instances once this happens - Check the file descriptor limits of the OS Thanks On Sat, Sep 12, 2015 at 10:56 PM, Martin Eppel (meppel) <mep...@cisco.com> wrote: > Resending in case it got lost, > > > > Thanks > > > > Martin > > > > *From:* Martin Eppel (meppel) > *Sent:* Thursday, September 10, 2015 2:39 PM > *To:* dev@stratos.apache.org > *Subject:* Stratos 4.1: "Too many open files" issue > > > > Hi, > > > > We are seeing an issue with stratos running out of file handles when > creating a number of applications and VM instances: > > > > The scenario is as follows: > > > > 13 applications are deployed, each with a single cluster and a single > member instance, > > > > As the VMs spin up stratos becomes unresponsive and checking the logs we > find the following exceptions (see below). I remember we had seen similar > issues (same exceptions) back in stratos 4.0 in the context of longevity > tests. > > > > We are running stratos 4.1 RC4 with the latest commit > > > > commit 0fd41840fb04d92ba921bf58c59c2c3fbad0c561 > > Author: Imesh Gunaratne <im...@apache.org> > > Date: Tue Jul 7 12:54:47 2015 +0530 > > > > Is this a known issue which might have been fixed in a later commit or > something new ? Can we verify that the fixes for the previous issues are > included in our system (jars, commit,s etc …) ? > > > > > > > > > > rg.apache.thrift.transport.TTransportException: java.net.SocketException: > Too many open files > at > org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:118) > at > org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35) > at > org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31) > at > org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:106) > at > org.wso2.carbon.databridge.receiver.thrift.internal.ThriftDataReceiver$ServerThread.run(ThriftDataReceiver.java:199) > at java.lang.Thread.run(Thread.java:745) > TID: [0] [STRATOS] [2015-08-17 17:38:17,499] WARN > {org.apache.thrift.server.TThreadPoolServer} - Transport error occurred > during acceptance of message. > org.apache.thrift.transport.TTransportException: java.net.SocketException: > Too many open files > at > org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:118) > at > org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35) > at > org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31) > at > org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:106) > at > org.wso2.carbon.databridge.receiver.thrift.internal.ThriftDataReceiver$ServerThread.run(ThriftDataReceiver.java:199) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.SocketException: Too many open files > > > > // listing the applications, member isntances and cartridge state: > > > > [di-000-xxx] – application name, > > > > di-000-010: applicationInstances 1, groupInstances 0, clusterInstances 1, > members 1 (Starting 1) > > di-000-011: applicationInstances 1, groupInstances 0, clusterInstances 1, > members 1 (Initialized 1) > > cartridge-proxy: applicationInstances 1, groupInstances 0, > clusterInstances 1, members 1 (Active 1) > > di-000-001: applicationInstances 1, groupInstances 0, clusterInstances 1, > members 1 (Active 1) > > di-000-002: applicationInstances 1, groupInstances 0, clusterInstances 1, > members 1 (Active 1) > > di-000-012: applicationInstances 1, groupInstances 0, clusterInstances 1, > members 1 (Created 1) > > di-000-003: applicationInstances 1, groupInstances 0, clusterInstances 1, > members 1 (Starting 1) > > di-000-004: applicationInstances 1, groupInstances 0, clusterInstances 1, > members 1 (Starting 1) > > di-000-006: applicationInstances 1, groupInstances 0, clusterInstances 1, > members 1 (Starting 1) > > di-000-005: applicationInstances 1, groupInstances 0, clusterInstances 1, > members 1 (Starting 1) > > di-000-008: applicationInstances 1, groupInstances 0, clusterInstances 1, > members 1 (Starting 1) > > di-000-007: applicationInstances 1, groupInstances 0, clusterInstances 1, > members 1 (Starting 1) > > di-000-009: applicationInstances 1, groupInstances 0, clusterInstances 1, > members 1 (Starting 1) > > > -- Imesh Gunaratne Senior Technical Lead, WSO2 Committer & PMC Member, Apache Stratos