Hi all,

I believe that Stratos is missing a memory leak fix in 
libthrift_0.7.0.wso2v2.jar as follows...


1.     For unknown reasons, we sometimes get Stratos' memory footprint growing 
from the normal "1.0something" GB of virtual memory to 10 GB and then 34 GB in 
a matter of seconds:



top - 21:21:55 up  4:39,  1 user,  load average: 0.01, 0.08, 0.18
Tasks: 135 total,   3 running, 132 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.5 us,  0.9 sy,  0.0 ni, 95.3 id,  1.1 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem:  16434456 total,  9867416 used,  6567040 free,    95772 buffers
KiB Swap:        0 total,        0 used,        0 free.  6485696 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2913 netiq     20   0 4702012 1.083g  18976 S   2.9  6.9   1:28.04 java
 2741 root      20   0 3395640 600544  14504 S   1.8  3.7   0:34.09 java
25941 root      20   0  186084  37700  26636 S   0.7  0.2   1:48.86 corosync
...



top - 21:23:55 up  4:41,  1 user,  load average: 1.08, 0.55, 0.35
Tasks: 137 total,   3 running, 134 sleeping,   0 stopped,   0 zombie
%Cpu(s): 34.1 us, 10.8 sy,  0.0 ni, 45.9 id,  0.9 wa,  0.0 hi,  8.3 si,  0.0 st
KiB Mem:  16434456 total, 10957936 used,  5476520 free,    96024 buffers
KiB Swap:        0 total,        0 used,        0 free.  6599088 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2913 netiq     20   0 10.236g 1.411g  18956 S  91.0  9.0   3:17.37 java
 2741 root      20   0 3395776 621352  14520 S  12.3  3.8   0:48.84 java
25941 root      20   0  186084  37700  26636 S   0.7  0.2   1:49.68 corosync
...



2.     The logs fill very rapidly at this point, so all we see is that after 
the fact, all 10 GB of logs look like this:

TID: [0] [STRATOS] [2015-11-22 21:27:22,795]  WARN 
{org.apache.thrift.server.TThreadPoolServer} -  Transport error occurred during 
acceptance of message.
org.apache.thrift.transport.TTransportException: java.net.SocketException: Too 
many open files
        at 
org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:118)
        at 
org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35)
        at 
org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
        at 
org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:106)
        at 
org.wso2.carbon.databridge.receiver.thrift.internal.ThriftDataReceiver$ServerThread.run(ThriftDataReceiver.java:199)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketException: Too many open files
        at java.net.PlainSocketImpl.socketAccept(Native Method)
        at 
java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
        at java.net.ServerSocket.implAccept(ServerSocket.java:530)
        at java.net.ServerSocket.accept(ServerSocket.java:498)
        at 
org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:113)
        ... 5 more

Now a cursory glance at upstream shows this was probably fixed upstream in 2015:



https://git-wip-us.apache.org/repos/asf?p=thrift.git;a=commitdiff;h=b1a35da9168cca5a7524ab9814161f024da145df

and given that our 0.7.0 jar file has content dated 2011, it likely does not 
have the fix. I also note that upstream has evolved considerably overall. Now, 
what I am not sure of is whether we are using an old library for some specific 
reason, e.g. was it hacked/modified by wso2? Is the new code not compatible 
with the Stratos codebase? If I am looking in the right place, the 
stratos/components/org.apache.stratos.common/pom.xml seems to be picking up a 
specific version:

    <dependency>
            <groupId>org.wso2.carbon</groupId>
            <artifactId>org.wso2.carbon.databridge.agent.thrift</artifactId>
            <version>${wso2carbon.version}</version>
        </dependency>

Do we know why? How to go about getting the fix? Please advise,

Thanks, Shaheed

Reply via email to