Hi Martin, I think we fixed this problem by uplifting Thrift agent feature in [1]. The root cause of this issue was that after periodic publishing, thrift agent fails to evict the old connection according to [2]. The fix is described in [3]. Looks like your stack trace is very similar to what has been reported in JIRA.
Can you check whether you have this fix applied? [1] https://github.com/apache/stratos/commit/8985d96eb811aa8e9ce2c114f1856b4c4e20517b [2] https://issues.apache.org/jira/browse/STRATOS-723 [3] https://issues.apache.org/jira/browse/STRATOS-739 Thanks. On Sun, Sep 13, 2015 at 10:56 AM, Imesh Gunaratne <im...@apache.org> wrote: > Hi Martin, > > I believe you are using 4.1.0-RC4 with some custom changes you have done > locally. Will you be able to test this on stratos-4.1.x branch latest > commit (without having any other changes)? I cannot recall a fix we did > after 4.1.0-RC4 for this but it would be better if you can verify with the > latest code in stratos-4.1.x branch. > > At the same time will you be able to do following: > > - Take a thread dump of the running Stratos, CEP instances once this > happens > - Check the file descriptor limits of the OS > > Thanks > > On Sat, Sep 12, 2015 at 10:56 PM, Martin Eppel (meppel) <mep...@cisco.com> > wrote: > >> Resending in case it got lost, >> >> >> >> Thanks >> >> >> >> Martin >> >> >> >> *From:* Martin Eppel (meppel) >> *Sent:* Thursday, September 10, 2015 2:39 PM >> *To:* dev@stratos.apache.org >> *Subject:* Stratos 4.1: "Too many open files" issue >> >> >> >> Hi, >> >> >> >> We are seeing an issue with stratos running out of file handles when >> creating a number of applications and VM instances: >> >> >> >> The scenario is as follows: >> >> >> >> 13 applications are deployed, each with a single cluster and a single >> member instance, >> >> >> >> As the VMs spin up stratos becomes unresponsive and checking the logs we >> find the following exceptions (see below). I remember we had seen similar >> issues (same exceptions) back in stratos 4.0 in the context of longevity >> tests. >> >> >> >> We are running stratos 4.1 RC4 with the latest commit >> >> >> >> commit 0fd41840fb04d92ba921bf58c59c2c3fbad0c561 >> >> Author: Imesh Gunaratne <im...@apache.org> >> >> Date: Tue Jul 7 12:54:47 2015 +0530 >> >> >> >> Is this a known issue which might have been fixed in a later commit or >> something new ? Can we verify that the fixes for the previous issues are >> included in our system (jars, commit,s etc …) ? >> >> >> >> >> >> >> >> >> >> rg.apache.thrift.transport.TTransportException: java.net.SocketException: >> Too many open files >> at >> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:118) >> at >> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35) >> at >> org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31) >> at >> org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:106) >> at >> org.wso2.carbon.databridge.receiver.thrift.internal.ThriftDataReceiver$ServerThread.run(ThriftDataReceiver.java:199) >> at java.lang.Thread.run(Thread.java:745) >> TID: [0] [STRATOS] [2015-08-17 17:38:17,499] WARN >> {org.apache.thrift.server.TThreadPoolServer} - Transport error occurred >> during acceptance of message. >> org.apache.thrift.transport.TTransportException: >> java.net.SocketException: Too many open files >> at >> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:118) >> at >> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35) >> at >> org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31) >> at >> org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:106) >> at >> org.wso2.carbon.databridge.receiver.thrift.internal.ThriftDataReceiver$ServerThread.run(ThriftDataReceiver.java:199) >> at java.lang.Thread.run(Thread.java:745) >> Caused by: java.net.SocketException: Too many open files >> >> >> >> // listing the applications, member isntances and cartridge state: >> >> >> >> [di-000-xxx] – application name, >> >> >> >> di-000-010: applicationInstances 1, groupInstances 0, clusterInstances 1, >> members 1 (Starting 1) >> >> di-000-011: applicationInstances 1, groupInstances 0, clusterInstances 1, >> members 1 (Initialized 1) >> >> cartridge-proxy: applicationInstances 1, groupInstances 0, >> clusterInstances 1, members 1 (Active 1) >> >> di-000-001: applicationInstances 1, groupInstances 0, clusterInstances 1, >> members 1 (Active 1) >> >> di-000-002: applicationInstances 1, groupInstances 0, clusterInstances 1, >> members 1 (Active 1) >> >> di-000-012: applicationInstances 1, groupInstances 0, clusterInstances 1, >> members 1 (Created 1) >> >> di-000-003: applicationInstances 1, groupInstances 0, clusterInstances 1, >> members 1 (Starting 1) >> >> di-000-004: applicationInstances 1, groupInstances 0, clusterInstances 1, >> members 1 (Starting 1) >> >> di-000-006: applicationInstances 1, groupInstances 0, clusterInstances 1, >> members 1 (Starting 1) >> >> di-000-005: applicationInstances 1, groupInstances 0, clusterInstances 1, >> members 1 (Starting 1) >> >> di-000-008: applicationInstances 1, groupInstances 0, clusterInstances 1, >> members 1 (Starting 1) >> >> di-000-007: applicationInstances 1, groupInstances 0, clusterInstances 1, >> members 1 (Starting 1) >> >> di-000-009: applicationInstances 1, groupInstances 0, clusterInstances 1, >> members 1 (Starting 1) >> >> >> > > > > -- > Imesh Gunaratne > > Senior Technical Lead, WSO2 > Committer & PMC Member, Apache Stratos > -- Akila Ravihansa Perera WSO2 Inc.; http://wso2.com/ Blog: http://ravihansa3000.blogspot.com