Touched, not typed. Erroneous words are a feature, not a typo. On Jul 24, 2014 8:55 AM, "Jeffrey Nguyen (jeffrngu)" <jeffr...@cisco.com> wrote: > > Hi Devs, > > We have a setup where Stratos stopped functioning after running for about 24 hours. wso2carbon.log shows a lot of exceptions like one listed below. We were advised to increase "ulimit" from default 1024 to around 65k. Just wanted to post this issue here to see if anybody has come across this issue and has successfully fixed the problem. > > Naturally one would ask if increasing the ulimit would just defer the problem a little longer. I mean 1024 seems like a lot to me if resources are only acquired for short duration and recycled properly. Which components/subcomponents of Stratos requires such large amount of resources? > > I took a memory dump of the hung Stratos process and ran memory leak analysis and found some leak suspects captured in the attached screenshots. These looks very suspicious to me. Can someone take a look and see if they are legit or just false alarms? One of the suspect leaks points to org.wso2.carbon.event.builder.core.internal.CarbonEventBuilderService, which is part of the Carbon core. I don't believe we have source code for Carbon core in Stratos code base. > > If increasing ulimit end up being the ultimate solution to this problem, have we done any kind of analysis that shows system load (e.g. Number of vms spawned) vs. ulimit values? I mean how do we know which ulimit value is good given a pre-defined system load? Or what type of load can 65k ulimit withstand? > IFAIR there was no analysis of it. Yes. It ia a good analysis to perform > Also, beside the default mysql DB, does Stratos use any other types of DB like Cassandra? I can see that we're launching Stratos with the option "-Ddisable.cassandra.server.startup=true" so I assume there's no embedded Cassandra. > Cassendra database is used for BAM for log publishing. I think Dinesh can provide a detailed answer for this. > From the exception stack trace below, it looks like Stratos uses Apache Thrift transport protocol. Just for my information, where do I go to learn about how/where this is used within Stratos? > Thrift protocol is used for communication between CEP and cartridge agents when cartridge agent sends statistics to CEP. > TID: [0] [STRATOS] [2014-07-16 07:03:58,597] WARN {org.apache.thrift.server.TThreadPoolServer} - Transport error occurred d > uring acceptance of message. {org.apache.thrift.server.TThreadPoolServer} > org.apache.thrift.transport.TTransportException: java.net.SocketException: Too many open files > at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:118) > at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35) > at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31) > at org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:106) > at org.wso2.carbon.databridge.receiver.thrift.internal.ThriftDataReceiver$ServerThread.run(ThriftDataReceiver.java:19 > 9) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.SocketException: Too many open files > at java.net.PlainSocketImpl.socketAccept(Native Method) > at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) > at java.net.ServerSocket.implAccept(ServerSocket.java:530) > at java.net.ServerSocket.accept(ServerSocket.java:498) > at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:113) > ... 5 more > > Regards, > -Jeffrey