[ 
https://issues.apache.org/jira/browse/CASSANDRA-13754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150172#comment-16150172
 ] 

Markus Dlugi commented on CASSANDRA-13754:
------------------------------------------

[~snazy], I don't think the node is overloaded. I originally thought so as 
well, so I made a little experiment where I included a cap in our load test 
limiting the {{INSERT}} s per minute from ~25,000 to ~10,000. As a consequence, 
the node survived a little longer, but in the end it still died with an 
{{OutOfMemoryError}} after more data had been inserted. So it's not that there 
are too many active writes, it's just that the node fails after a certain 
amount of total writes, which indicates to me that a memory leak is indeed 
happening.

I also had another look into the heap dump I sent you, and you are correct that 
the heap is mostly filled with {{BTree$Builder}} instances that still have 
stuff in their {{values}} array. However, if you look closer, you will notice 
that for each of these instances, the {{values}} array always contains {{null}} 
for the first couple of entries, and only after those there is still actual 
content. For some reason, the actual content always starts at index 28, whereas 
indices 0 - 27 are {{null}} - not sure if this is a coincidence? But you can 
also see that for all the {{BTree$Builder}} objects, the {{count}} attribute is 
0, which also indicates to me that {{BTree$Builder.cleanup()}} has already run 
and those are not active writes. This theory is supported by the fact that my 
little workaround of manually calling {{FastThreadLocal.removeAll()}} actually 
works, because this means that no other objects except the {{FastThreadLocal}} 
s still have references to the builders.

Therefore, I think we have two issues here:

# {{SEPWorker}} is never cleaning the {{FastThreadLocal}} s, therefore 
accumulating references to otherwise dead objects - maybe we can include 
something to at least remove non-static entries regularly?
# {{BTree$Builder}} seems to have an issue properly cleaning up after building, 
so the objects referenced by the {{FastThreadLocal}} s of the {{SEPWorker}} 
threads are very large and thus ultimately lead to the {{OutOfMemoryError}} s

> FastThreadLocal leaks memory
> ----------------------------
>
>                 Key: CASSANDRA-13754
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13754
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Cassandra 3.11.0, Netty 4.0.44.Final, OpenJDK 8u141-b15
>            Reporter: Eric Evans
>            Assignee: Robert Stupp
>             Fix For: 3.11.1
>
>
> After a chronic bout of {{OutOfMemoryError}} in our development environment, 
> a heap analysis is showing that more than 10G of our 12G heaps are consumed 
> by the {{threadLocals}} members (instances of {{java.lang.ThreadLocalMap}}) 
> of various {{io.netty.util.concurrent.FastThreadLocalThread}} instances.  
> Reverting 
> [cecbe17|https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=cecbe17e3eafc052acc13950494f7dddf026aa54]
>  fixes the issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to