Hi Martin,

I think we fixed this problem by uplifting Thrift agent feature in [1]. The
root cause of this issue was that after periodic publishing, thrift agent
fails to evict the old connection according to [2]. The fix is described in
[3]. Looks like your stack trace is very similar to what has been reported
in JIRA.

Can you check whether you have this fix applied?

[1]
https://github.com/apache/stratos/commit/8985d96eb811aa8e9ce2c114f1856b4c4e20517b
[2] https://issues.apache.org/jira/browse/STRATOS-723
[3] https://issues.apache.org/jira/browse/STRATOS-739

Thanks.

On Sun, Sep 13, 2015 at 10:56 AM, Imesh Gunaratne <im...@apache.org> wrote:

> Hi Martin,
>
> I believe you are using 4.1.0-RC4 with some custom changes you have done
> locally. Will you be able to test this on stratos-4.1.x branch latest
> commit (without having any other changes)? I cannot recall a fix we did
> after 4.1.0-RC4 for this but it would be better if you can verify with the
> latest code in stratos-4.1.x branch.
>
> At the same time will you be able to do following:
>
>    - Take a thread dump of the running Stratos, CEP instances once this
>    happens
>    - Check the file descriptor limits of the OS
>
> Thanks
>
> On Sat, Sep 12, 2015 at 10:56 PM, Martin Eppel (meppel) <mep...@cisco.com>
> wrote:
>
>> Resending in case it got lost,
>>
>>
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>> *From:* Martin Eppel (meppel)
>> *Sent:* Thursday, September 10, 2015 2:39 PM
>> *To:* dev@stratos.apache.org
>> *Subject:* Stratos 4.1: "Too many open files" issue
>>
>>
>>
>> Hi,
>>
>>
>>
>> We are seeing an issue with stratos running out of file handles when
>> creating a number of applications and VM instances:
>>
>>
>>
>> The scenario is as follows:
>>
>>
>>
>> 13 applications are deployed, each with a single cluster and a single
>> member instance,
>>
>>
>>
>> As the VMs spin up stratos becomes unresponsive and checking the logs we
>> find the following exceptions (see below). I remember we had seen similar
>> issues (same exceptions) back in stratos 4.0 in the context of longevity
>> tests.
>>
>>
>>
>> We are running stratos 4.1 RC4 with the latest  commit
>>
>>
>>
>> commit 0fd41840fb04d92ba921bf58c59c2c3fbad0c561
>>
>> Author: Imesh Gunaratne <im...@apache.org>
>>
>> Date:   Tue Jul 7 12:54:47 2015 +0530
>>
>>
>>
>> Is this a known issue which might have been fixed in a later commit or
>> something new ? Can we verify that the fixes for the previous issues are
>> included in our system (jars, commit,s etc …) ?
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> rg.apache.thrift.transport.TTransportException: java.net.SocketException:
>> Too many open files
>> at
>> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:118)
>> at
>> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35)
>> at
>> org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
>> at
>> org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:106)
>> at
>> org.wso2.carbon.databridge.receiver.thrift.internal.ThriftDataReceiver$ServerThread.run(ThriftDataReceiver.java:199)
>> at java.lang.Thread.run(Thread.java:745)
>> TID: [0] [STRATOS] [2015-08-17 17:38:17,499] WARN
>> {org.apache.thrift.server.TThreadPoolServer} - Transport error occurred
>> during acceptance of message.
>> org.apache.thrift.transport.TTransportException:
>> java.net.SocketException: Too many open files
>> at
>> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:118)
>> at
>> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35)
>> at
>> org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
>> at
>> org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:106)
>> at
>> org.wso2.carbon.databridge.receiver.thrift.internal.ThriftDataReceiver$ServerThread.run(ThriftDataReceiver.java:199)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.net.SocketException: Too many open files
>>
>>
>>
>> // listing the applications, member isntances and cartridge state:
>>
>>
>>
>> [di-000-xxx] – application name,
>>
>>
>>
>> di-000-010: applicationInstances 1, groupInstances 0, clusterInstances 1,
>> members 1 (Starting 1)
>>
>> di-000-011: applicationInstances 1, groupInstances 0, clusterInstances 1,
>> members 1 (Initialized 1)
>>
>> cartridge-proxy: applicationInstances 1, groupInstances 0,
>> clusterInstances 1, members 1 (Active 1)
>>
>> di-000-001: applicationInstances 1, groupInstances 0, clusterInstances 1,
>> members 1 (Active 1)
>>
>> di-000-002: applicationInstances 1, groupInstances 0, clusterInstances 1,
>> members 1 (Active 1)
>>
>> di-000-012: applicationInstances 1, groupInstances 0, clusterInstances 1,
>> members 1 (Created 1)
>>
>> di-000-003: applicationInstances 1, groupInstances 0, clusterInstances 1,
>> members 1 (Starting 1)
>>
>> di-000-004: applicationInstances 1, groupInstances 0, clusterInstances 1,
>> members 1 (Starting 1)
>>
>> di-000-006: applicationInstances 1, groupInstances 0, clusterInstances 1,
>> members 1 (Starting 1)
>>
>> di-000-005: applicationInstances 1, groupInstances 0, clusterInstances 1,
>> members 1 (Starting 1)
>>
>> di-000-008: applicationInstances 1, groupInstances 0, clusterInstances 1,
>> members 1 (Starting 1)
>>
>> di-000-007: applicationInstances 1, groupInstances 0, clusterInstances 1,
>> members 1 (Starting 1)
>>
>> di-000-009: applicationInstances 1, groupInstances 0, clusterInstances 1,
>> members 1 (Starting 1)
>>
>>
>>
>
>
>
> --
> Imesh Gunaratne
>
> Senior Technical Lead, WSO2
> Committer & PMC Member, Apache Stratos
>



-- 
Akila Ravihansa Perera
WSO2 Inc.;  http://wso2.com/

Blog: http://ravihansa3000.blogspot.com

Reply via email to