[ https://issues.apache.org/jira/browse/STRATOS-723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092471#comment-14092471 ]
Reka Thirunavukkarasu commented on STRATOS-723: ----------------------------------------------- This issue has been introduced by the thrift data agent code which used to publish events from Cartridge agent, stratos and load balancer. This issue has been fixed in the current branch in WSO2 according to https://wso2.org/jira/browse/BAM-1748. Once WSO2 releases this particular component, we can add the fixed version of this component to stratos. When changing the version of this component, we will have to change in the following places. 1. Cartridge agent refers 4.2.0 version of this component (org.apache.stratos.cartridge.agent) 2. Cloud controller refers 4.2.1 version of this component (org.apache.stratos.cloud.controller) 3. Load balancer refers 4.2.0 version of this component (org.apache.stratos.load.balancer, org.apache.stratos.load.balancer.common) 4. Stratos common and usage refer to 4.2.0 (org.apache.stratos.common, org.apache.stratos.usage.agent) The above places should change to refer 4.2.4 to get the fix after WSO2 releases this component. We need to make the cartridge agent configurable with thrift agent configuration in order to tune the parameters effectively according to the publishing time. So that we can tune the connection's eviction time higher than the publishing interval. Currently eviction time is 5.5s and publishing time is 10s. Every time when we publish, old connection got evicted and new connection is getting created. By tuning the eviction time higher than the publishing interval, we can make the agent to reuse the connections. All other stratos components are capable to tuning that parameter. We need to add this to the documentation. Thanks, Reka > Stratos stopped working with java.net.SocketException > ----------------------------------------------------- > > Key: STRATOS-723 > URL: https://issues.apache.org/jira/browse/STRATOS-723 > Project: Stratos > Issue Type: Bug > Components: Cloud Controller > Affects Versions: 4.0.0 > Environment: Stratos 4.0.0 GA > Reporter: Jeffrey Nguyen > Fix For: 4.0.1 > > > We have a setup where Stratos stopped functioning after running for about 24 > hours. wso2carbon.log shows a lot of exceptions like one listed below. We > were advised to increase "ulimit" from default 1024 to around 65k but that's > only delaying the problem. > TID: [0] [STRATOS] [2014-07-16 07:03:58,597] WARN > {org.apache.thrift.server.TThreadPoolServer} - Transport error occurred d > uring acceptance of message. {org.apache.thrift.server.TThreadPoolServer} > org.apache.thrift.transport.TTransportException: java.net.SocketException: > Too many open files > at > org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:118) > at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35) > at > org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31) > at > org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:106) > at > org.wso2.carbon.databridge.receiver.thrift.internal.ThriftDataReceiver$ServerThread.run(ThriftDataReceiver.java:19 > 9) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.SocketException: Too many open files > at java.net.PlainSocketImpl.socketAccept(Native Method) > at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) > at java.net.ServerSocket.implAccept(ServerSocket.java:530) > at java.net.ServerSocket.accept(ServerSocket.java:498) > at > org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:113) > ... 5 more -- This message was sent by Atlassian JIRA (v6.2#6252)