[ https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14586335#comment-14586335 ]
Ivar Thorson commented on CASSANDRA-9549: ----------------------------------------- As another data point, we upgraded our servers to 5.1.6 and see the same issue. > Memory leak > ------------ > > Key: CASSANDRA-9549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9549 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes, > 2 cores 7.5G memory, 800G platter for cassandra data, root partition and > commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1 > replica/zone > JVM: /usr/java/jdk1.8.0_40/jre/bin/java > JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar > -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities > -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M > -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003 > -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled > -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 > -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly > -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler > -XX:CMSWaitDuration=10000 -XX:+CMSParallelInitialMarkEnabled > -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=10000 -XX:+UseCondCardMark > -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 > -Dcom.sun.management.jmxremote.rmi.port=7199 > -Dcom.sun.management.jmxremote.ssl=false > -Dcom.sun.management.jmxremote.authenticate=false > -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra > -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid > Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux > Reporter: Ivar Thorson > Priority: Critical > Fix For: 2.1.x > > Attachments: c4_system.log, c7fromboot.zip, cassandra.yaml, > cpu-load.png, memoryuse.png, ref-java-errors.jpeg, suspect.png, two-loads.png > > > We have been experiencing a severe memory leak with Cassandra 2.1.5 that, > over the period of a couple of days, eventually consumes all of the available > JVM heap space, putting the JVM into GC hell where it keeps trying CMS > collection but can't free up any heap space. This pattern happens for every > node in our cluster and is requiring rolling cassandra restarts just to keep > the cluster running. We have upgraded the cluster per Datastax docs from the > 2.0 branch a couple of months ago and have been using the data from this > cluster for more than a year without problem. > As the heap fills up with non-GC-able objects, the CPU/OS load average grows > along with it. Heap dumps reveal an increasing number of > java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps > over a 2 day period, and watched the number of Node objects go from 4M, to > 19M, to 36M, and eventually about 65M objects before the node stops > responding. The screen capture of our heap dump is from the 19M measurement. > Load on the cluster is minimal. We can see this effect even with only a > handful of writes per second. (See attachments for Opscenter snapshots during > very light loads and heavier loads). Even with only 5 reads a sec we see this > behavior. > Log files show repeated errors in Ref.java:181 and Ref.java:279 and "LEAK > detected" messages: > {code} > ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error > when closing class > org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1302301946:/data1/data/ourtablegoeshere-ka-1150 > java.util.concurrent.RejectedExecutionException: Task > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@32680b31 > rejected from > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@573464d6[Terminated, > pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1644] > {code} > {code} > ERROR [Reference-Reaper:1] 2015-06-01 18:27:37,083 Ref.java:181 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@74b5df92) to class > org.apache.cassandra.io.sstable.SSTableReader$DescriptorTypeTidy@2054303604:/data2/data/ourtablegoeshere-ka-1151 > was not released before the reference was garbage collected > {code} > This might be related to [CASSANDRA-8723]? -- This message was sent by Atlassian JIRA (v6.3.4#6332)