[ https://issues.apache.org/jira/browse/CASSANDRA-15006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16768374#comment-16768374 ]
Benedict edited comment on CASSANDRA-15006 at 2/14/19 3:09 PM: --------------------------------------------------------------- Hi [~jborgstrom], The {{DirectByteBufferR}} simply means it is a read only byte buffer. This might be mapped. Unfortunately, given conflation of terms around {{BufferPool}} it is hard to understand what your graphs mean. Could you explicitly define them for me? What does each graph title directly map to; how is it being produced? I would not bother truncating any table. Ideally, we would get a heap dump posted somewhere privately for us to download and analyse. We also really need to understand the memory environment of the node; you indicate you have limited the process to 3GiB by cgroups, but we can see much more than this committed to the process. If you could please post a full log file from the node as well, so we can see at least what configuration settings it is starting with, as it may be that this is all completely acceptable. There is still insufficient information to say for sure there is a leak, instead of simply incremental growth within the defined configuration bounds. was (Author: benedict): Hi [~jborgstrom], The {{DirectByteBufferR}} simply means it is a read only byte buffer. This might be mapped. Unfortunately, given conflation of terms around {{BufferPool}} it is hard to understand what your graphs mean. Could you explicitly define them for me? What does each graph title directly map to; how is it being produced? I would not bother truncating any table. Ideally, we would get a heap dump posted somewhere privately for us to download and analyse. > Possible java.nio.DirectByteBuffer leak > --------------------------------------- > > Key: CASSANDRA-15006 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15006 > Project: Cassandra > Issue Type: Bug > Environment: cassandra: 3.11.3 > jre: openjdk version "1.8.0_181" > heap size: 2GB > memory limit: 3GB (cgroup) > I started one of the nodes with "-Djdk.nio.maxCachedBufferSize=262144" but > that did not seem to make any difference. > Reporter: Jonas Borgström > Priority: Major > Attachments: CASSANDRA-15006-reference-chains.png, > Screenshot_2019-02-04 Grafana - Cassandra.png, Screenshot_2019-02-14 Grafana > - Cassandra(1).png, Screenshot_2019-02-14 Grafana - Cassandra.png > > > While testing a 3 node 3.11.3 cluster I noticed that the nodes were suddenly > killed by the Linux OOM killer after running without issues for 4-5 weeks. > After enabling more metrics and leaving the nodes running for 12 days it sure > looks like the > "java.nio:type=BufferPool,name=direct" Mbean shows a very linear growth > (approx 15MiB/24h, see attached screenshot). Is this expected to keep growing > linearly after 12 days with a constant load? > > In my setup the growth/leak is about 15MiB/day so I guess in most setups it > would take quite a few days until it becomes noticeable. I'm able to see the > same type of slow growth in other production clusters even though the graph > data is more noisy. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org