Hi Martin, What cartridge agent are you using currently? Is it java or Python? This problem was identified from the thrift data publisher side in java. Since python agent is using different approach to connect to data receiver, we will need to verify whether python agent fixed this particular issue. If you could explain who are connecting to stratos using thrift, then we can check on the thrift agent version as Akila mentioned and find out the root cause of this issue.
Thanks, Reka On Sun, Sep 13, 2015 at 12:46 PM, Akila Ravihansa Perera <raviha...@wso2.com > wrote: > Hi Martin, > > I think we fixed this problem by uplifting Thrift agent feature in [1]. > The root cause of this issue was that after periodic publishing, thrift > agent fails to evict the old connection according to [2]. The fix is > described in [3]. Looks like your stack trace is very similar to what has > been reported in JIRA. > > Can you check whether you have this fix applied? > > [1] > https://github.com/apache/stratos/commit/8985d96eb811aa8e9ce2c114f1856b4c4e20517b > [2] https://issues.apache.org/jira/browse/STRATOS-723 > [3] https://issues.apache.org/jira/browse/STRATOS-739 > > Thanks. > > On Sun, Sep 13, 2015 at 10:56 AM, Imesh Gunaratne <im...@apache.org> > wrote: > >> Hi Martin, >> >> I believe you are using 4.1.0-RC4 with some custom changes you have done >> locally. Will you be able to test this on stratos-4.1.x branch latest >> commit (without having any other changes)? I cannot recall a fix we did >> after 4.1.0-RC4 for this but it would be better if you can verify with the >> latest code in stratos-4.1.x branch. >> >> At the same time will you be able to do following: >> >> - Take a thread dump of the running Stratos, CEP instances once this >> happens >> - Check the file descriptor limits of the OS >> >> Thanks >> >> On Sat, Sep 12, 2015 at 10:56 PM, Martin Eppel (meppel) <mep...@cisco.com >> > wrote: >> >>> Resending in case it got lost, >>> >>> >>> >>> Thanks >>> >>> >>> >>> Martin >>> >>> >>> >>> *From:* Martin Eppel (meppel) >>> *Sent:* Thursday, September 10, 2015 2:39 PM >>> *To:* dev@stratos.apache.org >>> *Subject:* Stratos 4.1: "Too many open files" issue >>> >>> >>> >>> Hi, >>> >>> >>> >>> We are seeing an issue with stratos running out of file handles when >>> creating a number of applications and VM instances: >>> >>> >>> >>> The scenario is as follows: >>> >>> >>> >>> 13 applications are deployed, each with a single cluster and a single >>> member instance, >>> >>> >>> >>> As the VMs spin up stratos becomes unresponsive and checking the logs we >>> find the following exceptions (see below). I remember we had seen similar >>> issues (same exceptions) back in stratos 4.0 in the context of longevity >>> tests. >>> >>> >>> >>> We are running stratos 4.1 RC4 with the latest commit >>> >>> >>> >>> commit 0fd41840fb04d92ba921bf58c59c2c3fbad0c561 >>> >>> Author: Imesh Gunaratne <im...@apache.org> >>> >>> Date: Tue Jul 7 12:54:47 2015 +0530 >>> >>> >>> >>> Is this a known issue which might have been fixed in a later commit or >>> something new ? Can we verify that the fixes for the previous issues are >>> included in our system (jars, commit,s etc …) ? >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> rg.apache.thrift.transport.TTransportException: >>> java.net.SocketException: Too many open files >>> at >>> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:118) >>> at >>> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35) >>> at >>> org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31) >>> at >>> org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:106) >>> at >>> org.wso2.carbon.databridge.receiver.thrift.internal.ThriftDataReceiver$ServerThread.run(ThriftDataReceiver.java:199) >>> at java.lang.Thread.run(Thread.java:745) >>> TID: [0] [STRATOS] [2015-08-17 17:38:17,499] WARN >>> {org.apache.thrift.server.TThreadPoolServer} - Transport error occurred >>> during acceptance of message. >>> org.apache.thrift.transport.TTransportException: >>> java.net.SocketException: Too many open files >>> at >>> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:118) >>> at >>> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35) >>> at >>> org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31) >>> at >>> org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:106) >>> at >>> org.wso2.carbon.databridge.receiver.thrift.internal.ThriftDataReceiver$ServerThread.run(ThriftDataReceiver.java:199) >>> at java.lang.Thread.run(Thread.java:745) >>> Caused by: java.net.SocketException: Too many open files >>> >>> >>> >>> // listing the applications, member isntances and cartridge state: >>> >>> >>> >>> [di-000-xxx] – application name, >>> >>> >>> >>> di-000-010: applicationInstances 1, groupInstances 0, clusterInstances >>> 1, members 1 (Starting 1) >>> >>> di-000-011: applicationInstances 1, groupInstances 0, clusterInstances >>> 1, members 1 (Initialized 1) >>> >>> cartridge-proxy: applicationInstances 1, groupInstances 0, >>> clusterInstances 1, members 1 (Active 1) >>> >>> di-000-001: applicationInstances 1, groupInstances 0, clusterInstances >>> 1, members 1 (Active 1) >>> >>> di-000-002: applicationInstances 1, groupInstances 0, clusterInstances >>> 1, members 1 (Active 1) >>> >>> di-000-012: applicationInstances 1, groupInstances 0, clusterInstances >>> 1, members 1 (Created 1) >>> >>> di-000-003: applicationInstances 1, groupInstances 0, clusterInstances >>> 1, members 1 (Starting 1) >>> >>> di-000-004: applicationInstances 1, groupInstances 0, clusterInstances >>> 1, members 1 (Starting 1) >>> >>> di-000-006: applicationInstances 1, groupInstances 0, clusterInstances >>> 1, members 1 (Starting 1) >>> >>> di-000-005: applicationInstances 1, groupInstances 0, clusterInstances >>> 1, members 1 (Starting 1) >>> >>> di-000-008: applicationInstances 1, groupInstances 0, clusterInstances >>> 1, members 1 (Starting 1) >>> >>> di-000-007: applicationInstances 1, groupInstances 0, clusterInstances >>> 1, members 1 (Starting 1) >>> >>> di-000-009: applicationInstances 1, groupInstances 0, clusterInstances >>> 1, members 1 (Starting 1) >>> >>> >>> >> >> >> >> -- >> Imesh Gunaratne >> >> Senior Technical Lead, WSO2 >> Committer & PMC Member, Apache Stratos >> > > > > -- > Akila Ravihansa Perera > WSO2 Inc.; http://wso2.com/ > > Blog: http://ravihansa3000.blogspot.com > -- Reka Thirunavukkarasu Senior Software Engineer, WSO2, Inc.:http://wso2.com, Mobile: +94776442007