Hi, After some poking around. I think the problem here is as follows:
1. Apache Stratos contains multiple dependencies directly on libthrift. o We can modify the build to change the version of libthrift in these cases, except that we need 0.9.3, and (AFAIK) the version has to be in the wso2 maven repositories. 2. Apache Stratos also contains one or more dependencies indirectly on libthrift (e.g. via wso2’s carbon) for which we have no source or ability to rebuild. 3. From the stack trace, we cannot tell which instance of these dependencies is at fault. Suggestions/advice welcome! Thanks, Shaheed From: Shaheedur Haque (shahhaqu) Sent: 25 November 2015 11:45 To: dev@stratos.apache.org Cc: Martin Eppel (meppel); Ali Bidabadi (abidabad) Subject: RE: File handle leak in Thrift BTW, to be clear, I have no idea *which* copy of libthrift is at fault, as apache-stratos seems to contain more than one either directly or indirectly (via carbon?); the stack trace I provided is the full stack trace. From: Shaheedur Haque (shahhaqu) Sent: 25 November 2015 11:15 To: dev@stratos.apache.org<mailto:dev@stratos.apache.org> Cc: Martin Eppel (meppel); Ali Bidabadi (abidabad) Subject: RE: File handle leak in Thrift Hi Imesh, I’m pretty sure that STARTOS-1108 is some other leak/issue, not least because we have that fix ☺. You will see my analysis has a rather different stack trace, and points to a very obvious coding bug in Thrift. As I say, I am happy to verify that the Thrift bug fix addresses the issue we see, but I believe I need the wso2 package repository to have the new Thrift version (or else some instructions on how to point to maven.org for this package only) to do so. Thanks, Shaheed From: Imesh Gunaratne [mailto:im...@apache.org] Sent: 25 November 2015 02:26 To: dev Cc: Martin Eppel (meppel); Ali Bidabadi (abidabad) Subject: Re: File handle leak in Thrift Hi Shaheed, AFAIK we fixed this memory issue in CEP in Stratos 4.1.0 RC3: https://issues.apache.org/jira/browse/STRATOS-1108 Thanks On Tue, Nov 24, 2015 at 2:51 PM, Shaheedur Haque (shahhaqu) <shahh...@cisco.com<mailto:shahh...@cisco.com>> wrote: Hi Isuru, I now believe I cited the wrong dependency in my original email. In fact, I don’t know what the correct dependency is to update Thrift. You’ll see from my second email that I tried setting thrift.version here: $ grep -r thrift.version ../stratos/ ../stratos/features/manager/logging-mgt/pom.xml: <version>${libthrift.version}</version> ../stratos/features/manager/logging-mgt/pom.xml: <libthrift.version>0.7.wso2v1</libthrift.version> But as I said, that just gave the build error I noted. Also confusing me is that the above pom.xml sets the version to 0.7.wso2v1 whereas my .zip file clearly contains 0.7.wso2v2. Note that I am not familiar with how Maven config works...so any clarification is most welcome! Thanks, Shaheed From: isu...@wso2.com<mailto:isu...@wso2.com> [mailto:isu...@wso2.com<mailto:isu...@wso2.com>] On Behalf Of Isuru Haththotuwa Sent: 24 November 2015 02:26 To: dev Cc: Martin Eppel (meppel); Ali Bidabadi (abidabad) Subject: Re: File handle leak in Thrift Shaheed, I could not find the jar with the mentioned version hosted in the relevant nexus repository [1]. Can you please double check if the version is correct? [1]. http://maven.wso2.org/nexus/content/groups/wso2-public/org/wso2/carbon/org.wso2.carbon.databridge.agent.thrift/ On Mon, Nov 23, 2015 at 11:41 PM, Shaheedur Haque (shahhaqu) <shahh...@cisco.com<mailto:shahh...@cisco.com>> wrote: It seems the upstream fix is in Thrift 0.9.3. Now, I think I pasted the wrong dependency in the email below, but changing the variable “thrift.version” to 0.9.3 simply resulted in a build failure: [ERROR] Failed to execute goal on project org.apache.stratos.common: Could not resolve dependencies for project org.apache.stratos:org.apache.stratos.common:bundle:4.1.0: Could not find artifact org.wso2.carbon:org.wso2.carbon.databridge.agent.thrift:jar:0.9.3 in central (http://repo1.maven.org/maven2) -> [Help 1] I’m not sure (a) if I got the right variable, and if I did (b) why it did not work. How else do I get the fix? From: Shaheedur Haque (shahhaqu) Sent: 23 November 2015 13:46 To: d...@stratos.incubator.apache.org<mailto:d...@stratos.incubator.apache.org> Cc: Martin Eppel (meppel); Ali Bidabadi (abidabad) Subject: File handle leak in Thrift Hi all, I believe that Stratos is missing a memory leak fix in libthrift_0.7.0.wso2v2.jar as follows… 1. For unknown reasons, we sometimes get Stratos’ memory footprint growing from the normal “1.0something” GB of virtual memory to 10 GB and then 34 GB in a matter of seconds: top - 21:21:55 up 4:39, 1 user, load average: 0.01, 0.08, 0.18 Tasks: 135 total, 3 running, 132 sleeping, 0 stopped, 0 zombie %Cpu(s): 2.5 us, 0.9 sy, 0.0 ni, 95.3 id, 1.1 wa, 0.0 hi, 0.1 si, 0.0 st KiB Mem: 16434456 total, 9867416 used, 6567040 free, 95772 buffers KiB Swap: 0 total, 0 used, 0 free. 6485696 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2913 netiq 20 0 4702012 1.083g 18976 S 2.9 6.9 1:28.04 java 2741 root 20 0 3395640 600544 14504 S 1.8 3.7 0:34.09 java 25941 root 20 0 186084 37700 26636 S 0.7 0.2 1:48.86 corosync ... top - 21:23:55 up 4:41, 1 user, load average: 1.08, 0.55, 0.35 Tasks: 137 total, 3 running, 134 sleeping, 0 stopped, 0 zombie %Cpu(s): 34.1 us, 10.8 sy, 0.0 ni, 45.9 id, 0.9 wa, 0.0 hi, 8.3 si, 0.0 st KiB Mem: 16434456 total, 10957936 used, 5476520 free, 96024 buffers KiB Swap: 0 total, 0 used, 0 free. 6599088 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2913 netiq 20 0 10.236g 1.411g 18956 S 91.0 9.0 3:17.37 java 2741 root 20 0 3395776 621352 14520 S 12.3 3.8 0:48.84 java 25941 root 20 0 186084 37700 26636 S 0.7 0.2 1:49.68 corosync ... 2. The logs fill very rapidly at this point, so all we see is that after the fact, all 10 GB of logs look like this: TID: [0] [STRATOS] [2015-11-22 21:27:22,795] WARN {org.apache.thrift.server.TThreadPoolServer} - Transport error occurred during acceptance of message. org.apache.thrift.transport.TTransportException: java.net.SocketException: Too many open files at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:118) at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35) at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31) at org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:106) at org.wso2.carbon.databridge.receiver.thrift.internal.ThriftDataReceiver$ServerThread.run(ThriftDataReceiver.java:199) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.SocketException: Too many open files at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) at java.net.ServerSocket.implAccept(ServerSocket.java:530) at java.net.ServerSocket.accept(ServerSocket.java:498) at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:113) ... 5 more Now a cursory glance at upstream shows this was probably fixed upstream in 2015: https://git-wip-us.apache.org/repos/asf?p=thrift.git;a=commitdiff;h=b1a35da9168cca5a7524ab9814161f024da145df and given that our 0.7.0 jar file has content dated 2011, it likely does not have the fix. I also note that upstream has evolved considerably overall. Now, what I am not sure of is whether we are using an old library for some specific reason, e.g. was it hacked/modified by wso2? Is the new code not compatible with the Stratos codebase? If I am looking in the right place, the stratos/components/org.apache.stratos.common/pom.xml seems to be picking up a specific version: <dependency> <groupId>org.wso2.carbon</groupId> <artifactId>org.wso2.carbon.databridge.agent.thrift</artifactId> <version>${wso2carbon.version}</version> </dependency> Do we know why? How to go about getting the fix? Please advise, Thanks, Shaheed -- Thanks and Regards, Isuru H. +94 716 358 048<tel:%2B94%20716%20358%20048> -- Imesh Gunaratne Senior Technical Lead, WSO2 Committer & PMC Member, Apache Stratos