[ 
https://issues.apache.org/jira/browse/CASSANDRA-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16718243#comment-16718243
 ] 

Joseph Lynch commented on CASSANDRA-14922:
------------------------------------------

Nailed the MessageSink leak and properly shuts down logback now, and we clean 
up ThreadLocal variables which afaict means that there are no additional 
threads that could hold references ... and at least now [my 
branch|https://github.com/jolynch/cassandra/tree/CASSANDRA-14922] can pass unit 
tests in 
[circleci|https://circleci.com/workflow-run/6fb24842-bbb8-4aac-b137-4007729bf39a]
 so that's good.

Unfortunately I think we're still leaking, according to my heap dump analysis 
the _main_ thread ends up with a strong reference to the 
{{InstanceClassLoaders}}, which is really odd since the InstanceClassLoader 
doesn't have any static state so I don't see how those don't go out of scope 
when the test methods finish.

[~ifesdjeen] tbh at this point I'm pretty stuck. My understanding is that the 
{{InstanceClassLoaders}} should go out of scope after each test method, which 
should allow everything to GC now that there are no more live references from 
threads to the {{InstanceClassLoaders}}, but that's not happening. I think it 
might be related to passing in the current threads classloader 
[here|https://github.com/jolynch/cassandra/blob/1ca6ad3c41f4456b674e13883e0df0091f638564/test/distributed/org/apache/cassandra/distributed/TestCluster.java#L241]
 but I'm not sure how to achieve what we need there without that.

> In JVM dtests need to clean up after instance shutdown
> ------------------------------------------------------
>
>                 Key: CASSANDRA-14922
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14922
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Testing
>            Reporter: Joseph Lynch
>            Assignee: Joseph Lynch
>            Priority: Minor
>         Attachments: AllThreadsStopped.png, ClassLoadersRetaining.png, 
> Leaking_Metrics_On_Shutdown.png, MainClassRetaining.png, 
> OnlyThreeRootsLeft.png
>
>
> Currently the unit tests are failing on circleci ([example 
> one|https://circleci.com/gh/jolynch/cassandra/300#tests/containers/1], 
> [example 
> two|https://circleci.com/gh/rustyrazorblade/cassandra/44#tests/containers/1]) 
> because we use a small container (medium) for unit tests by default and the 
> in JVM dtests are leaking a few hundred megabytes of memory per test right 
> now. This is not a big deal because the dtest runs with the larger containers 
> continue to function fine as well as local testing as the number of in JVM 
> dtests is not yet high enough to cause a problem with more than 2GB of 
> available heap. However we should fix the memory leak so that going forwards 
> we can add more in JVM dtests without worry.
> I've been working with [~ifesdjeen] to debug, and the issue appears to be 
> unreleased Table/Keyspace metrics (screenshot showing the leak attached). I 
> believe that we have a few potential issues that are leading to the leaks:
> 1. The 
> [{{Instance::shutdown}}|https://github.com/apache/cassandra/blob/f22fec927de7ac291266660c2f34de5b8cc1c695/test/distributed/org/apache/cassandra/distributed/Instance.java#L328-L354]
>  method is not successfully cleaning up all the metrics created by the 
> {{CassandraMetricsRegistry}}
>  2. The 
> [{{TestCluster::close}}|https://github.com/apache/cassandra/blob/f22fec927de7ac291266660c2f34de5b8cc1c695/test/distributed/org/apache/cassandra/distributed/TestCluster.java#L283]
>  method is not waiting for all the instances to finish shutting down and 
> cleaning up before continuing on
> 3. I'm not sure if this is an issue assuming we clear all metrics, but 
> [{{TableMetrics::release}}|https://github.com/apache/cassandra/blob/4ae229f5cd270c2b43475b3f752a7b228de260ea/src/java/org/apache/cassandra/metrics/TableMetrics.java#L951]
>  does not release all the metric references (which could leak them)
> I am working on a patch which shuts down everything and assures that we do 
> not leak memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to