[ https://issues.apache.org/jira/browse/CASSANDRA-13754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150172#comment-16150172 ]
Markus Dlugi commented on CASSANDRA-13754: ------------------------------------------ [~snazy], I don't think the node is overloaded. I originally thought so as well, so I made a little experiment where I included a cap in our load test limiting the {{INSERT}} s per minute from ~25,000 to ~10,000. As a consequence, the node survived a little longer, but in the end it still died with an {{OutOfMemoryError}} after more data had been inserted. So it's not that there are too many active writes, it's just that the node fails after a certain amount of total writes, which indicates to me that a memory leak is indeed happening. I also had another look into the heap dump I sent you, and you are correct that the heap is mostly filled with {{BTree$Builder}} instances that still have stuff in their {{values}} array. However, if you look closer, you will notice that for each of these instances, the {{values}} array always contains {{null}} for the first couple of entries, and only after those there is still actual content. For some reason, the actual content always starts at index 28, whereas indices 0 - 27 are {{null}} - not sure if this is a coincidence? But you can also see that for all the {{BTree$Builder}} objects, the {{count}} attribute is 0, which also indicates to me that {{BTree$Builder.cleanup()}} has already run and those are not active writes. This theory is supported by the fact that my little workaround of manually calling {{FastThreadLocal.removeAll()}} actually works, because this means that no other objects except the {{FastThreadLocal}} s still have references to the builders. Therefore, I think we have two issues here: # {{SEPWorker}} is never cleaning the {{FastThreadLocal}} s, therefore accumulating references to otherwise dead objects - maybe we can include something to at least remove non-static entries regularly? # {{BTree$Builder}} seems to have an issue properly cleaning up after building, so the objects referenced by the {{FastThreadLocal}} s of the {{SEPWorker}} threads are very large and thus ultimately lead to the {{OutOfMemoryError}} s > FastThreadLocal leaks memory > ---------------------------- > > Key: CASSANDRA-13754 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13754 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Cassandra 3.11.0, Netty 4.0.44.Final, OpenJDK 8u141-b15 > Reporter: Eric Evans > Assignee: Robert Stupp > Fix For: 3.11.1 > > > After a chronic bout of {{OutOfMemoryError}} in our development environment, > a heap analysis is showing that more than 10G of our 12G heaps are consumed > by the {{threadLocals}} members (instances of {{java.lang.ThreadLocalMap}}) > of various {{io.netty.util.concurrent.FastThreadLocalThread}} instances. > Reverting > [cecbe17|https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=cecbe17e3eafc052acc13950494f7dddf026aa54] > fixes the issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org