Something was annoying me about this as I seemed to recall opening/closing thousands of connections and ending up with a <4MB heap on a 32bit VM previously (turns out it was ~2.6MB after checking). Also, some of the leaks I previously removed for 0.10 held ServerConnection and ServerSession objects in memory so there is really no way I wouldn't have seen this at that point.
Actually checking the commits properly this time, the defect was not introduced by what I thought it was; it has only existed on trunk on March 8th. 0.10 is not affected by this particular leak. Robbie From: Robbie Gemmell [mailto:[email protected]] Sent: 24 March 2011 18:12 To: [email protected] Subject: Re: MINANetworkDriver.sessionClosed does not get called at times Was this test/work run/done against trunk, or against 0.8? The 0.8 release (and to much lesser extent 0.6 too) had several extremely nasty memory issues, but whilst undertaking the work for 0.10 that Andrew referenced below I was able to run a couple thousand connections into the broker within around 20MB of heap at the end (on a 64bit JVM, so knock maybe 75% off for a 32bit JVM). I have just performed the indicated test and had the same result. Having taken and examined the heap dump of the above test I can see that we are indeed leaking connections and associated objects, however not for the reasons indicated so far. The 0-10 connections historically were not added to the ConnectionRegistry but now are being added, and the code for the feature which added them also does remove them when that feature is used, however it fails to remove them when the close is invoked from the client side (ie, the normal case). As a result, the connection is left in the ConnectionRegistry and is held in memory along with its associated Objects. (...and I have jsut noticed Andrew said as much below, doh..always read the full mail :P) I dont think nulling the references between the objects in question is the way to go here, it might help prevent the impact to a certain extent but it doesnt fix the underlying issue and may just introduce more (there is almost certainly scope for NPEs in there). Having crudely put in removal of the connections from the registry upon a standard clsoe, I was able to run 2000 connections into the broker and come out with 4MB of heap used instead of 13MB previously. I will look to properly put this fix into 0.11/0.12 soon, but its too late for 0.10 now; since this is actually not a regression since 0.6 or 0.8 (from which memory usage is actually massively improved) I definitely dont think its a blocker. As Andrew also noted, a specific test for this might be useful. Robbie On 24 March 2011 00:11, Andrew Kennedy <[email protected]> wrote: Thanks. We had looked at the memory usage of 0-10 connections and seen a similar effect when Robbie was fixing some other memory leaks. The 64K element Method array in the 0-10 Session objects is needed so that sessions can synchronise with the client, although perhaps a more efficient structure could be used to store them, such as a Map that can grow and shrink dynamically? As you say, overall memory usage is much reduced now, so the real issue is the failure to deregister connections in the broker. I will have to try your code from QPID-3162 to see if the same issue still exists on the updated 0-10 network stack, since I remember adding explicit calls to remove connections from the broker registry... Looking at the patch, I have a few points: * in MultiVersionProtocolEngine you set _delegate to null, same with _protocolEngine in MINANetworkDriver, and the Mina IoSession attachment is also set to null. Which of these was *actually* preventing the GC from reclaiming memory? If we know that, then we can remove only that reference. This many seems like overkill if the session can be closed properly. * in Connection, setState(CLOSED) is functionally equivalent to the two lines you replace it with, since the monitor on lock is held already. * calling _connection.onOpen(null) from the closed method in ProtocolEngine_0_10 seems odd. It sets the _onOpenTask of ServerConnection to null, which you could do in setState(CLOSED) of ServerConnection (see previous item) instead? I will need to investigate a bit further, but adding a test case seems like a useful thing to try and do. Andrew. -- -- andrew d kennedy ? do not fold, bend, spindle, or mutilate ; -- http://grkvlt.blogspot.com/ ? edinburgh : +44 7582 293 255 ; On 23 Mar 2011, at 19:06, Danushka Menikkumbura wrote: Just run the sample client that I have mentioned in [1] and see the number of ProtocolEngine_0_10 and related objects in heap on JProfiler. You should be able to see 50 of them. [1] - https://issues.apache.org/jira/browse/QPID-3162 Thanks, Danushka On Thu, Mar 24, 2011 at 12:13 AM, Danushka Menikkumbura <[email protected]> wrote: Hi Andrew, I just checked the latest trunk. The issue is still there, the memory consumption is very less, though. As I can see its because of the way the incomplete Method array is allocated in Assembler now. But still I can see the same result w.r.t. instances that are not GC'ed. Thanks, Danushka On Wed, Mar 23, 2011 at 6:07 PM, Andrew Kennedy <[email protected]> wrote: Also, I will add a test case to the new network code. I haven't seen this behaviour on the branch I'm working on, so it may be fixed. I'll wait for your patch before checking again, though. I should have more time to work on the in-vm networking branch now that I've started a new job; I know I said it would be part of 0.10 but circumstances meant I didn't have enough time to finish the work. Andrew. -- -- andrew d kennedy ? do not fold, bend, spindle, or mutilate ; -- http://grkvlt.blogspot.com/ ? edinburgh : +44 7582 293 255 ; On 23 Mar 2011, at 10:47, Andrew Kennedy wrote: Hi, Danushka. I'll take a look at this, it looks like a race during session close, which we have seen before. Andrew. -- -- andrew d kennedy ? edinburgh : +44 7582 293 255 On 23 March 2011 00:43, Danushka Menikkumbura <[email protected]> wrote: Hi, Any idea what is going wrong here?. Thanks, Danushka On Tue, Mar 22, 2011 at 12:30 PM, Danushka Menikkumbura < [email protected]> wrote: You can run a simple connection create/close loop and check the number of times MINANetworkDriver.sessionClosed gets called. When I did that I observed this particular method got only called 49 time when I ran the loop for 50 iterations, 99 times for 100 and 994 for 1000. Am I missing something here?. I also see that the ProtocolEngine_0_10 instance count in heap is equal to the number of times this method did not get called. Hence the memory leak. These objects retain in the memory as they are not removed from the _typeMap in ConfigStore as removeConfiguredObject does not get called. Thanks, Danushka On Tue, Mar 22, 2011 at 5:09 AM, Danushka Menikkumbura < [email protected]> wrote: Hi devs, I see sometimes (1/50) MINANetworkDriver.sessionClosed does not get called causing a memory leak. Is this a known issue?. Thanks, Danushka --------------------------------------------------------------------- Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:[email protected]
