[jira] [Commented] (CASSANDRA-9549) Memory leak in Ref.GlobalState due to pathological ConcurrentLinkedQueue.remove behaviour

2016-04-12 Thread stone (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238615#comment-15238615
 ] 

stone commented on CASSANDRA-9549:
--

@Benedict Thanks for your answer.
I got it.

open a ticket https://issues.apache.org/jira/browse/CASSANDRA-11460
at first,I thought it same as this ticket,and now I realize I made a mistake.

the ticket is opened about 2 weeks,but still no response,can you help a look.

> Memory leak in Ref.GlobalState due to pathological 
> ConcurrentLinkedQueue.remove behaviour
> -
>
> Key: CASSANDRA-9549
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9549
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes, 
> 2 cores 7.5G memory, 800G platter for cassandra data, root partition and 
> commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1 
> replica/zone
> JVM: /usr/java/jdk1.8.0_40/jre/bin/java 
> JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
> -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
> -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M 
> -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
> -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
> -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
> -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler 
> -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled 
> -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=1 -XX:+UseCondCardMark 
> -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
> -Dcom.sun.management.jmxremote.rmi.port=7199 
> -Dcom.sun.management.jmxremote.ssl=false 
> -Dcom.sun.management.jmxremote.authenticate=false 
> -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra 
> -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid 
> Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Ivar Thorson
>Assignee: Benedict
>Priority: Critical
> Fix For: 2.1.7
>
> Attachments: c4_system.log, c7fromboot.zip, cassandra.yaml, 
> cpu-load.png, memoryuse.png, ref-java-errors.jpeg, suspect.png, two-loads.png
>
>
> We have been experiencing a severe memory leak with Cassandra 2.1.5 that, 
> over the period of a couple of days, eventually consumes all of the available 
> JVM heap space, putting the JVM into GC hell where it keeps trying CMS 
> collection but can't free up any heap space. This pattern happens for every 
> node in our cluster and is requiring rolling cassandra restarts just to keep 
> the cluster running. We have upgraded the cluster per Datastax docs from the 
> 2.0 branch a couple of months ago and have been using the data from this 
> cluster for more than a year without problem.
> As the heap fills up with non-GC-able objects, the CPU/OS load average grows 
> along with it. Heap dumps reveal an increasing number of 
> java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps 
> over a 2 day period, and watched the number of Node objects go from 4M, to 
> 19M, to 36M, and eventually about 65M objects before the node stops 
> responding. The screen capture of our heap dump is from the 19M measurement.
> Load on the cluster is minimal. We can see this effect even with only a 
> handful of writes per second. (See attachments for Opscenter snapshots during 
> very light loads and heavier loads). Even with only 5 reads a sec we see this 
> behavior.
> Log files show repeated errors in Ref.java:181 and Ref.java:279 and "LEAK 
> detected" messages:
> {code}
> ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error 
> when closing class 
> org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1302301946:/data1/data/ourtablegoeshere-ka-1150
> java.util.concurrent.RejectedExecutionException: Task 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@32680b31 
> rejected from 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@573464d6[Terminated,
>  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1644]
> {code}
> {code}
> ERROR [Reference-Reaper:1] 2015-06-01 18:27:37,083 Ref.java:181 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@74b5df92) to class 
> org.apache.cassandra.io.sstable.SSTableReader$DescriptorTypeTidy@2054303604:/data2/data/ourtablegoeshere-ka-1151
>  was not released before the reference was garbage collected
> {code}
> This might be related to [CASSANDRA-8723]?



--
This message was sent by Atlassian 

[jira] [Commented] (CASSANDRA-9549) Memory leak in Ref.GlobalState due to pathological ConcurrentLinkedQueue.remove behaviour

2016-04-09 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233832#comment-15233832
 ] 

Benedict commented on CASSANDRA-9549:
-

What is obtuse?

bq. how to resolve?

Move to a version >= fixVersion, i.e. 2.1.7

bq. why this happen

The [last comment  with more than one 
sentence|https://issues.apache.org/jira/browse/CASSANDRA-9549?focusedCommentId=14586587=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14586587],
 only six comments back, spells out what happened and why.

I realise JIRA noise can be quite an issue in many cases, but in this instance 
it seems to me that just a modicum of effort was necessary to find the answers 
you sought.



> Memory leak in Ref.GlobalState due to pathological 
> ConcurrentLinkedQueue.remove behaviour
> -
>
> Key: CASSANDRA-9549
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9549
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes, 
> 2 cores 7.5G memory, 800G platter for cassandra data, root partition and 
> commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1 
> replica/zone
> JVM: /usr/java/jdk1.8.0_40/jre/bin/java 
> JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
> -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
> -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M 
> -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
> -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
> -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
> -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler 
> -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled 
> -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=1 -XX:+UseCondCardMark 
> -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
> -Dcom.sun.management.jmxremote.rmi.port=7199 
> -Dcom.sun.management.jmxremote.ssl=false 
> -Dcom.sun.management.jmxremote.authenticate=false 
> -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra 
> -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid 
> Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Ivar Thorson
>Assignee: Benedict
>Priority: Critical
> Fix For: 2.1.7
>
> Attachments: c4_system.log, c7fromboot.zip, cassandra.yaml, 
> cpu-load.png, memoryuse.png, ref-java-errors.jpeg, suspect.png, two-loads.png
>
>
> We have been experiencing a severe memory leak with Cassandra 2.1.5 that, 
> over the period of a couple of days, eventually consumes all of the available 
> JVM heap space, putting the JVM into GC hell where it keeps trying CMS 
> collection but can't free up any heap space. This pattern happens for every 
> node in our cluster and is requiring rolling cassandra restarts just to keep 
> the cluster running. We have upgraded the cluster per Datastax docs from the 
> 2.0 branch a couple of months ago and have been using the data from this 
> cluster for more than a year without problem.
> As the heap fills up with non-GC-able objects, the CPU/OS load average grows 
> along with it. Heap dumps reveal an increasing number of 
> java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps 
> over a 2 day period, and watched the number of Node objects go from 4M, to 
> 19M, to 36M, and eventually about 65M objects before the node stops 
> responding. The screen capture of our heap dump is from the 19M measurement.
> Load on the cluster is minimal. We can see this effect even with only a 
> handful of writes per second. (See attachments for Opscenter snapshots during 
> very light loads and heavier loads). Even with only 5 reads a sec we see this 
> behavior.
> Log files show repeated errors in Ref.java:181 and Ref.java:279 and "LEAK 
> detected" messages:
> {code}
> ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error 
> when closing class 
> org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1302301946:/data1/data/ourtablegoeshere-ka-1150
> java.util.concurrent.RejectedExecutionException: Task 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@32680b31 
> rejected from 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@573464d6[Terminated,
>  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1644]
> {code}
> {code}
> ERROR [Reference-Reaper:1] 2015-06-01 18:27:37,083 Ref.java:181 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@74b5df92) 

[jira] [Commented] (CASSANDRA-9549) Memory leak in Ref.GlobalState due to pathological ConcurrentLinkedQueue.remove behaviour

2016-04-09 Thread stone (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233823#comment-15233823
 ] 

stone commented on CASSANDRA-9549:
--

could you take a summary after resolving the issue?
why this happen?,how to resolve?
actually,it's hard to find the answer when people met the same issue.

> Memory leak in Ref.GlobalState due to pathological 
> ConcurrentLinkedQueue.remove behaviour
> -
>
> Key: CASSANDRA-9549
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9549
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes, 
> 2 cores 7.5G memory, 800G platter for cassandra data, root partition and 
> commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1 
> replica/zone
> JVM: /usr/java/jdk1.8.0_40/jre/bin/java 
> JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
> -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
> -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M 
> -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
> -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
> -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
> -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler 
> -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled 
> -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=1 -XX:+UseCondCardMark 
> -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
> -Dcom.sun.management.jmxremote.rmi.port=7199 
> -Dcom.sun.management.jmxremote.ssl=false 
> -Dcom.sun.management.jmxremote.authenticate=false 
> -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra 
> -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid 
> Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Ivar Thorson
>Assignee: Benedict
>Priority: Critical
> Fix For: 2.1.7
>
> Attachments: c4_system.log, c7fromboot.zip, cassandra.yaml, 
> cpu-load.png, memoryuse.png, ref-java-errors.jpeg, suspect.png, two-loads.png
>
>
> We have been experiencing a severe memory leak with Cassandra 2.1.5 that, 
> over the period of a couple of days, eventually consumes all of the available 
> JVM heap space, putting the JVM into GC hell where it keeps trying CMS 
> collection but can't free up any heap space. This pattern happens for every 
> node in our cluster and is requiring rolling cassandra restarts just to keep 
> the cluster running. We have upgraded the cluster per Datastax docs from the 
> 2.0 branch a couple of months ago and have been using the data from this 
> cluster for more than a year without problem.
> As the heap fills up with non-GC-able objects, the CPU/OS load average grows 
> along with it. Heap dumps reveal an increasing number of 
> java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps 
> over a 2 day period, and watched the number of Node objects go from 4M, to 
> 19M, to 36M, and eventually about 65M objects before the node stops 
> responding. The screen capture of our heap dump is from the 19M measurement.
> Load on the cluster is minimal. We can see this effect even with only a 
> handful of writes per second. (See attachments for Opscenter snapshots during 
> very light loads and heavier loads). Even with only 5 reads a sec we see this 
> behavior.
> Log files show repeated errors in Ref.java:181 and Ref.java:279 and "LEAK 
> detected" messages:
> {code}
> ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error 
> when closing class 
> org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1302301946:/data1/data/ourtablegoeshere-ka-1150
> java.util.concurrent.RejectedExecutionException: Task 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@32680b31 
> rejected from 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@573464d6[Terminated,
>  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1644]
> {code}
> {code}
> ERROR [Reference-Reaper:1] 2015-06-01 18:27:37,083 Ref.java:181 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@74b5df92) to class 
> org.apache.cassandra.io.sstable.SSTableReader$DescriptorTypeTidy@2054303604:/data2/data/ourtablegoeshere-ka-1151
>  was not released before the reference was garbage collected
> {code}
> This might be related to [CASSANDRA-8723]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9549) Memory leak in Ref.GlobalState due to pathological ConcurrentLinkedQueue.remove behaviour

2015-10-22 Thread Maxim Podkolzine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968882#comment-14968882
 ] 

Maxim Podkolzine commented on CASSANDRA-9549:
-

Is this bug fixed in Cassandra 2.2.0?

> Memory leak in Ref.GlobalState due to pathological 
> ConcurrentLinkedQueue.remove behaviour
> -
>
> Key: CASSANDRA-9549
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9549
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes, 
> 2 cores 7.5G memory, 800G platter for cassandra data, root partition and 
> commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1 
> replica/zone
> JVM: /usr/java/jdk1.8.0_40/jre/bin/java 
> JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
> -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
> -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M 
> -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
> -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
> -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
> -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler 
> -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled 
> -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=1 -XX:+UseCondCardMark 
> -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
> -Dcom.sun.management.jmxremote.rmi.port=7199 
> -Dcom.sun.management.jmxremote.ssl=false 
> -Dcom.sun.management.jmxremote.authenticate=false 
> -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra 
> -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid 
> Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Ivar Thorson
>Assignee: Benedict
>Priority: Critical
> Fix For: 2.1.7
>
> Attachments: c4_system.log, c7fromboot.zip, cassandra.yaml, 
> cpu-load.png, memoryuse.png, ref-java-errors.jpeg, suspect.png, two-loads.png
>
>
> We have been experiencing a severe memory leak with Cassandra 2.1.5 that, 
> over the period of a couple of days, eventually consumes all of the available 
> JVM heap space, putting the JVM into GC hell where it keeps trying CMS 
> collection but can't free up any heap space. This pattern happens for every 
> node in our cluster and is requiring rolling cassandra restarts just to keep 
> the cluster running. We have upgraded the cluster per Datastax docs from the 
> 2.0 branch a couple of months ago and have been using the data from this 
> cluster for more than a year without problem.
> As the heap fills up with non-GC-able objects, the CPU/OS load average grows 
> along with it. Heap dumps reveal an increasing number of 
> java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps 
> over a 2 day period, and watched the number of Node objects go from 4M, to 
> 19M, to 36M, and eventually about 65M objects before the node stops 
> responding. The screen capture of our heap dump is from the 19M measurement.
> Load on the cluster is minimal. We can see this effect even with only a 
> handful of writes per second. (See attachments for Opscenter snapshots during 
> very light loads and heavier loads). Even with only 5 reads a sec we see this 
> behavior.
> Log files show repeated errors in Ref.java:181 and Ref.java:279 and "LEAK 
> detected" messages:
> {code}
> ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error 
> when closing class 
> org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1302301946:/data1/data/ourtablegoeshere-ka-1150
> java.util.concurrent.RejectedExecutionException: Task 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@32680b31 
> rejected from 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@573464d6[Terminated,
>  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1644]
> {code}
> {code}
> ERROR [Reference-Reaper:1] 2015-06-01 18:27:37,083 Ref.java:181 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@74b5df92) to class 
> org.apache.cassandra.io.sstable.SSTableReader$DescriptorTypeTidy@2054303604:/data2/data/ourtablegoeshere-ka-1151
>  was not released before the reference was garbage collected
> {code}
> This might be related to [CASSANDRA-8723]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9549) Memory leak

2015-06-17 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589683#comment-14589683
 ] 

Marcus Eriksson commented on CASSANDRA-9549:


+1

 Memory leak 
 

 Key: CASSANDRA-9549
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9549
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes, 
 2 cores 7.5G memory, 800G platter for cassandra data, root partition and 
 commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1 
 replica/zone
 JVM: /usr/java/jdk1.8.0_40/jre/bin/java 
 JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
 -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
 -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M 
 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
 -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
 -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler 
 -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled 
 -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=1 -XX:+UseCondCardMark 
 -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
 -Dcom.sun.management.jmxremote.rmi.port=7199 
 -Dcom.sun.management.jmxremote.ssl=false 
 -Dcom.sun.management.jmxremote.authenticate=false 
 -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra 
 -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid 
 Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
Reporter: Ivar Thorson
Assignee: Benedict
Priority: Critical
 Fix For: 2.1.x

 Attachments: c4_system.log, c7fromboot.zip, cassandra.yaml, 
 cpu-load.png, memoryuse.png, ref-java-errors.jpeg, suspect.png, two-loads.png


 We have been experiencing a severe memory leak with Cassandra 2.1.5 that, 
 over the period of a couple of days, eventually consumes all of the available 
 JVM heap space, putting the JVM into GC hell where it keeps trying CMS 
 collection but can't free up any heap space. This pattern happens for every 
 node in our cluster and is requiring rolling cassandra restarts just to keep 
 the cluster running. We have upgraded the cluster per Datastax docs from the 
 2.0 branch a couple of months ago and have been using the data from this 
 cluster for more than a year without problem.
 As the heap fills up with non-GC-able objects, the CPU/OS load average grows 
 along with it. Heap dumps reveal an increasing number of 
 java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps 
 over a 2 day period, and watched the number of Node objects go from 4M, to 
 19M, to 36M, and eventually about 65M objects before the node stops 
 responding. The screen capture of our heap dump is from the 19M measurement.
 Load on the cluster is minimal. We can see this effect even with only a 
 handful of writes per second. (See attachments for Opscenter snapshots during 
 very light loads and heavier loads). Even with only 5 reads a sec we see this 
 behavior.
 Log files show repeated errors in Ref.java:181 and Ref.java:279 and LEAK 
 detected messages:
 {code}
 ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error 
 when closing class 
 org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1302301946:/data1/data/ourtablegoeshere-ka-1150
 java.util.concurrent.RejectedExecutionException: Task 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@32680b31 
 rejected from 
 org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@573464d6[Terminated,
  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1644]
 {code}
 {code}
 ERROR [Reference-Reaper:1] 2015-06-01 18:27:37,083 Ref.java:181 - LEAK 
 DETECTED: a reference 
 (org.apache.cassandra.utils.concurrent.Ref$State@74b5df92) to class 
 org.apache.cassandra.io.sstable.SSTableReader$DescriptorTypeTidy@2054303604:/data2/data/ourtablegoeshere-ka-1151
  was not released before the reference was garbage collected
 {code}
 This might be related to [CASSANDRA-8723]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9549) Memory leak

2015-06-17 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589537#comment-14589537
 ] 

Benedict commented on CASSANDRA-9549:
-

I've added a regression test to the branch. Could I get a reviewer please, and 
can we ship this soon?

 Memory leak 
 

 Key: CASSANDRA-9549
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9549
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes, 
 2 cores 7.5G memory, 800G platter for cassandra data, root partition and 
 commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1 
 replica/zone
 JVM: /usr/java/jdk1.8.0_40/jre/bin/java 
 JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
 -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
 -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M 
 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
 -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
 -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler 
 -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled 
 -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=1 -XX:+UseCondCardMark 
 -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
 -Dcom.sun.management.jmxremote.rmi.port=7199 
 -Dcom.sun.management.jmxremote.ssl=false 
 -Dcom.sun.management.jmxremote.authenticate=false 
 -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra 
 -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid 
 Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
Reporter: Ivar Thorson
Assignee: Benedict
Priority: Critical
 Fix For: 2.1.x

 Attachments: c4_system.log, c7fromboot.zip, cassandra.yaml, 
 cpu-load.png, memoryuse.png, ref-java-errors.jpeg, suspect.png, two-loads.png


 We have been experiencing a severe memory leak with Cassandra 2.1.5 that, 
 over the period of a couple of days, eventually consumes all of the available 
 JVM heap space, putting the JVM into GC hell where it keeps trying CMS 
 collection but can't free up any heap space. This pattern happens for every 
 node in our cluster and is requiring rolling cassandra restarts just to keep 
 the cluster running. We have upgraded the cluster per Datastax docs from the 
 2.0 branch a couple of months ago and have been using the data from this 
 cluster for more than a year without problem.
 As the heap fills up with non-GC-able objects, the CPU/OS load average grows 
 along with it. Heap dumps reveal an increasing number of 
 java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps 
 over a 2 day period, and watched the number of Node objects go from 4M, to 
 19M, to 36M, and eventually about 65M objects before the node stops 
 responding. The screen capture of our heap dump is from the 19M measurement.
 Load on the cluster is minimal. We can see this effect even with only a 
 handful of writes per second. (See attachments for Opscenter snapshots during 
 very light loads and heavier loads). Even with only 5 reads a sec we see this 
 behavior.
 Log files show repeated errors in Ref.java:181 and Ref.java:279 and LEAK 
 detected messages:
 {code}
 ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error 
 when closing class 
 org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1302301946:/data1/data/ourtablegoeshere-ka-1150
 java.util.concurrent.RejectedExecutionException: Task 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@32680b31 
 rejected from 
 org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@573464d6[Terminated,
  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1644]
 {code}
 {code}
 ERROR [Reference-Reaper:1] 2015-06-01 18:27:37,083 Ref.java:181 - LEAK 
 DETECTED: a reference 
 (org.apache.cassandra.utils.concurrent.Ref$State@74b5df92) to class 
 org.apache.cassandra.io.sstable.SSTableReader$DescriptorTypeTidy@2054303604:/data2/data/ourtablegoeshere-ka-1151
  was not released before the reference was garbage collected
 {code}
 This might be related to [CASSANDRA-8723]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9549) Memory leak

2015-06-17 Thread Ivar Thorson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589997#comment-14589997
 ] 

Ivar Thorson commented on CASSANDRA-9549:
-

We patched our 2.1.6 cluster on Wednesday and let it run for a day to let 
things accumulate. Looking at CPU activity and heap space for the last day 
suggests that the memory leak seems to have been fixed by the patch. Awesome 
work!

 Memory leak 
 

 Key: CASSANDRA-9549
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9549
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes, 
 2 cores 7.5G memory, 800G platter for cassandra data, root partition and 
 commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1 
 replica/zone
 JVM: /usr/java/jdk1.8.0_40/jre/bin/java 
 JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
 -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
 -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M 
 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
 -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
 -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler 
 -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled 
 -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=1 -XX:+UseCondCardMark 
 -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
 -Dcom.sun.management.jmxremote.rmi.port=7199 
 -Dcom.sun.management.jmxremote.ssl=false 
 -Dcom.sun.management.jmxremote.authenticate=false 
 -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra 
 -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid 
 Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
Reporter: Ivar Thorson
Assignee: Benedict
Priority: Critical
 Fix For: 2.1.7

 Attachments: c4_system.log, c7fromboot.zip, cassandra.yaml, 
 cpu-load.png, memoryuse.png, ref-java-errors.jpeg, suspect.png, two-loads.png


 We have been experiencing a severe memory leak with Cassandra 2.1.5 that, 
 over the period of a couple of days, eventually consumes all of the available 
 JVM heap space, putting the JVM into GC hell where it keeps trying CMS 
 collection but can't free up any heap space. This pattern happens for every 
 node in our cluster and is requiring rolling cassandra restarts just to keep 
 the cluster running. We have upgraded the cluster per Datastax docs from the 
 2.0 branch a couple of months ago and have been using the data from this 
 cluster for more than a year without problem.
 As the heap fills up with non-GC-able objects, the CPU/OS load average grows 
 along with it. Heap dumps reveal an increasing number of 
 java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps 
 over a 2 day period, and watched the number of Node objects go from 4M, to 
 19M, to 36M, and eventually about 65M objects before the node stops 
 responding. The screen capture of our heap dump is from the 19M measurement.
 Load on the cluster is minimal. We can see this effect even with only a 
 handful of writes per second. (See attachments for Opscenter snapshots during 
 very light loads and heavier loads). Even with only 5 reads a sec we see this 
 behavior.
 Log files show repeated errors in Ref.java:181 and Ref.java:279 and LEAK 
 detected messages:
 {code}
 ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error 
 when closing class 
 org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1302301946:/data1/data/ourtablegoeshere-ka-1150
 java.util.concurrent.RejectedExecutionException: Task 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@32680b31 
 rejected from 
 org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@573464d6[Terminated,
  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1644]
 {code}
 {code}
 ERROR [Reference-Reaper:1] 2015-06-01 18:27:37,083 Ref.java:181 - LEAK 
 DETECTED: a reference 
 (org.apache.cassandra.utils.concurrent.Ref$State@74b5df92) to class 
 org.apache.cassandra.io.sstable.SSTableReader$DescriptorTypeTidy@2054303604:/data2/data/ourtablegoeshere-ka-1151
  was not released before the reference was garbage collected
 {code}
 This might be related to [CASSANDRA-8723]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9549) Memory leak

2015-06-17 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1458#comment-1458
 ] 

Benedict commented on CASSANDRA-9549:
-

Great, glad to hear it

 Memory leak 
 

 Key: CASSANDRA-9549
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9549
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes, 
 2 cores 7.5G memory, 800G platter for cassandra data, root partition and 
 commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1 
 replica/zone
 JVM: /usr/java/jdk1.8.0_40/jre/bin/java 
 JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
 -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
 -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M 
 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
 -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
 -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler 
 -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled 
 -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=1 -XX:+UseCondCardMark 
 -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
 -Dcom.sun.management.jmxremote.rmi.port=7199 
 -Dcom.sun.management.jmxremote.ssl=false 
 -Dcom.sun.management.jmxremote.authenticate=false 
 -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra 
 -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid 
 Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
Reporter: Ivar Thorson
Assignee: Benedict
Priority: Critical
 Fix For: 2.1.7

 Attachments: c4_system.log, c7fromboot.zip, cassandra.yaml, 
 cpu-load.png, memoryuse.png, ref-java-errors.jpeg, suspect.png, two-loads.png


 We have been experiencing a severe memory leak with Cassandra 2.1.5 that, 
 over the period of a couple of days, eventually consumes all of the available 
 JVM heap space, putting the JVM into GC hell where it keeps trying CMS 
 collection but can't free up any heap space. This pattern happens for every 
 node in our cluster and is requiring rolling cassandra restarts just to keep 
 the cluster running. We have upgraded the cluster per Datastax docs from the 
 2.0 branch a couple of months ago and have been using the data from this 
 cluster for more than a year without problem.
 As the heap fills up with non-GC-able objects, the CPU/OS load average grows 
 along with it. Heap dumps reveal an increasing number of 
 java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps 
 over a 2 day period, and watched the number of Node objects go from 4M, to 
 19M, to 36M, and eventually about 65M objects before the node stops 
 responding. The screen capture of our heap dump is from the 19M measurement.
 Load on the cluster is minimal. We can see this effect even with only a 
 handful of writes per second. (See attachments for Opscenter snapshots during 
 very light loads and heavier loads). Even with only 5 reads a sec we see this 
 behavior.
 Log files show repeated errors in Ref.java:181 and Ref.java:279 and LEAK 
 detected messages:
 {code}
 ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error 
 when closing class 
 org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1302301946:/data1/data/ourtablegoeshere-ka-1150
 java.util.concurrent.RejectedExecutionException: Task 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@32680b31 
 rejected from 
 org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@573464d6[Terminated,
  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1644]
 {code}
 {code}
 ERROR [Reference-Reaper:1] 2015-06-01 18:27:37,083 Ref.java:181 - LEAK 
 DETECTED: a reference 
 (org.apache.cassandra.utils.concurrent.Ref$State@74b5df92) to class 
 org.apache.cassandra.io.sstable.SSTableReader$DescriptorTypeTidy@2054303604:/data2/data/ourtablegoeshere-ka-1151
  was not released before the reference was garbage collected
 {code}
 This might be related to [CASSANDRA-8723]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9549) Memory leak

2015-06-15 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586425#comment-14586425
 ] 

Jeff Jirsa commented on CASSANDRA-9549:
---

Throwing a me-too here, copying summary from IRC (on the topic of 2.1.6 showing 
weird memory behavior that feels like a leak). Other user was also using DTCS:

11:07  jeffj opened CASSANDRA-9597 last night. dtcs + streaming = lots of 
sstables that won't compact efficiently and eventually (days after load is 
stopped) nodes end up ooming or in gc hell.
11:08  jeffj in our case, the PROBLEM is that sstables build up over time due 
to the weird way dtcs is selecting candidates to compact, but the symptom is 
very very very long gc pauses and eventual ooms.
11:10  jeffj i would very much believe there's a leak somewhere in 2.1.6. in 
our case, we saw the same behavior in 2.1.5, so i dont think it's a single 
minor version regression



 Memory leak 
 

 Key: CASSANDRA-9549
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9549
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes, 
 2 cores 7.5G memory, 800G platter for cassandra data, root partition and 
 commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1 
 replica/zone
 JVM: /usr/java/jdk1.8.0_40/jre/bin/java 
 JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
 -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
 -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M 
 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
 -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
 -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler 
 -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled 
 -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=1 -XX:+UseCondCardMark 
 -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
 -Dcom.sun.management.jmxremote.rmi.port=7199 
 -Dcom.sun.management.jmxremote.ssl=false 
 -Dcom.sun.management.jmxremote.authenticate=false 
 -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra 
 -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid 
 Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
Reporter: Ivar Thorson
Priority: Critical
 Fix For: 2.1.x

 Attachments: c4_system.log, c7fromboot.zip, cassandra.yaml, 
 cpu-load.png, memoryuse.png, ref-java-errors.jpeg, suspect.png, two-loads.png


 We have been experiencing a severe memory leak with Cassandra 2.1.5 that, 
 over the period of a couple of days, eventually consumes all of the available 
 JVM heap space, putting the JVM into GC hell where it keeps trying CMS 
 collection but can't free up any heap space. This pattern happens for every 
 node in our cluster and is requiring rolling cassandra restarts just to keep 
 the cluster running. We have upgraded the cluster per Datastax docs from the 
 2.0 branch a couple of months ago and have been using the data from this 
 cluster for more than a year without problem.
 As the heap fills up with non-GC-able objects, the CPU/OS load average grows 
 along with it. Heap dumps reveal an increasing number of 
 java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps 
 over a 2 day period, and watched the number of Node objects go from 4M, to 
 19M, to 36M, and eventually about 65M objects before the node stops 
 responding. The screen capture of our heap dump is from the 19M measurement.
 Load on the cluster is minimal. We can see this effect even with only a 
 handful of writes per second. (See attachments for Opscenter snapshots during 
 very light loads and heavier loads). Even with only 5 reads a sec we see this 
 behavior.
 Log files show repeated errors in Ref.java:181 and Ref.java:279 and LEAK 
 detected messages:
 {code}
 ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error 
 when closing class 
 org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1302301946:/data1/data/ourtablegoeshere-ka-1150
 java.util.concurrent.RejectedExecutionException: Task 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@32680b31 
 rejected from 
 org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@573464d6[Terminated,
  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1644]
 {code}
 {code}
 ERROR [Reference-Reaper:1] 2015-06-01 18:27:37,083 Ref.java:181 - LEAK 
 DETECTED: a reference 
 (org.apache.cassandra.utils.concurrent.Ref$State@74b5df92) to class 
 

[jira] [Commented] (CASSANDRA-9549) Memory leak

2015-06-15 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586460#comment-14586460
 ] 

Robbie Strickland commented on CASSANDRA-9549:
--

We also experience this issue on 2.1.5, and also running DTCS.

 Memory leak 
 

 Key: CASSANDRA-9549
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9549
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes, 
 2 cores 7.5G memory, 800G platter for cassandra data, root partition and 
 commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1 
 replica/zone
 JVM: /usr/java/jdk1.8.0_40/jre/bin/java 
 JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
 -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
 -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M 
 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
 -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
 -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler 
 -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled 
 -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=1 -XX:+UseCondCardMark 
 -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
 -Dcom.sun.management.jmxremote.rmi.port=7199 
 -Dcom.sun.management.jmxremote.ssl=false 
 -Dcom.sun.management.jmxremote.authenticate=false 
 -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra 
 -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid 
 Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
Reporter: Ivar Thorson
Priority: Critical
 Fix For: 2.1.x

 Attachments: c4_system.log, c7fromboot.zip, cassandra.yaml, 
 cpu-load.png, memoryuse.png, ref-java-errors.jpeg, suspect.png, two-loads.png


 We have been experiencing a severe memory leak with Cassandra 2.1.5 that, 
 over the period of a couple of days, eventually consumes all of the available 
 JVM heap space, putting the JVM into GC hell where it keeps trying CMS 
 collection but can't free up any heap space. This pattern happens for every 
 node in our cluster and is requiring rolling cassandra restarts just to keep 
 the cluster running. We have upgraded the cluster per Datastax docs from the 
 2.0 branch a couple of months ago and have been using the data from this 
 cluster for more than a year without problem.
 As the heap fills up with non-GC-able objects, the CPU/OS load average grows 
 along with it. Heap dumps reveal an increasing number of 
 java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps 
 over a 2 day period, and watched the number of Node objects go from 4M, to 
 19M, to 36M, and eventually about 65M objects before the node stops 
 responding. The screen capture of our heap dump is from the 19M measurement.
 Load on the cluster is minimal. We can see this effect even with only a 
 handful of writes per second. (See attachments for Opscenter snapshots during 
 very light loads and heavier loads). Even with only 5 reads a sec we see this 
 behavior.
 Log files show repeated errors in Ref.java:181 and Ref.java:279 and LEAK 
 detected messages:
 {code}
 ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error 
 when closing class 
 org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1302301946:/data1/data/ourtablegoeshere-ka-1150
 java.util.concurrent.RejectedExecutionException: Task 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@32680b31 
 rejected from 
 org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@573464d6[Terminated,
  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1644]
 {code}
 {code}
 ERROR [Reference-Reaper:1] 2015-06-01 18:27:37,083 Ref.java:181 - LEAK 
 DETECTED: a reference 
 (org.apache.cassandra.utils.concurrent.Ref$State@74b5df92) to class 
 org.apache.cassandra.io.sstable.SSTableReader$DescriptorTypeTidy@2054303604:/data2/data/ourtablegoeshere-ka-1151
  was not released before the reference was garbage collected
 {code}
 This might be related to [CASSANDRA-8723]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9549) Memory leak

2015-06-15 Thread Ivar Thorson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586335#comment-14586335
 ] 

Ivar Thorson commented on CASSANDRA-9549:
-

As another data point, we upgraded our servers to 5.1.6 and see the same issue.

 Memory leak 
 

 Key: CASSANDRA-9549
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9549
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes, 
 2 cores 7.5G memory, 800G platter for cassandra data, root partition and 
 commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1 
 replica/zone
 JVM: /usr/java/jdk1.8.0_40/jre/bin/java 
 JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
 -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
 -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M 
 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
 -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
 -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler 
 -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled 
 -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=1 -XX:+UseCondCardMark 
 -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
 -Dcom.sun.management.jmxremote.rmi.port=7199 
 -Dcom.sun.management.jmxremote.ssl=false 
 -Dcom.sun.management.jmxremote.authenticate=false 
 -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra 
 -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid 
 Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
Reporter: Ivar Thorson
Priority: Critical
 Fix For: 2.1.x

 Attachments: c4_system.log, c7fromboot.zip, cassandra.yaml, 
 cpu-load.png, memoryuse.png, ref-java-errors.jpeg, suspect.png, two-loads.png


 We have been experiencing a severe memory leak with Cassandra 2.1.5 that, 
 over the period of a couple of days, eventually consumes all of the available 
 JVM heap space, putting the JVM into GC hell where it keeps trying CMS 
 collection but can't free up any heap space. This pattern happens for every 
 node in our cluster and is requiring rolling cassandra restarts just to keep 
 the cluster running. We have upgraded the cluster per Datastax docs from the 
 2.0 branch a couple of months ago and have been using the data from this 
 cluster for more than a year without problem.
 As the heap fills up with non-GC-able objects, the CPU/OS load average grows 
 along with it. Heap dumps reveal an increasing number of 
 java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps 
 over a 2 day period, and watched the number of Node objects go from 4M, to 
 19M, to 36M, and eventually about 65M objects before the node stops 
 responding. The screen capture of our heap dump is from the 19M measurement.
 Load on the cluster is minimal. We can see this effect even with only a 
 handful of writes per second. (See attachments for Opscenter snapshots during 
 very light loads and heavier loads). Even with only 5 reads a sec we see this 
 behavior.
 Log files show repeated errors in Ref.java:181 and Ref.java:279 and LEAK 
 detected messages:
 {code}
 ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error 
 when closing class 
 org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1302301946:/data1/data/ourtablegoeshere-ka-1150
 java.util.concurrent.RejectedExecutionException: Task 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@32680b31 
 rejected from 
 org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@573464d6[Terminated,
  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1644]
 {code}
 {code}
 ERROR [Reference-Reaper:1] 2015-06-01 18:27:37,083 Ref.java:181 - LEAK 
 DETECTED: a reference 
 (org.apache.cassandra.utils.concurrent.Ref$State@74b5df92) to class 
 org.apache.cassandra.io.sstable.SSTableReader$DescriptorTypeTidy@2054303604:/data2/data/ourtablegoeshere-ka-1151
  was not released before the reference was garbage collected
 {code}
 This might be related to [CASSANDRA-8723]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9549) Memory leak

2015-06-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586549#comment-14586549
 ] 

Benedict commented on CASSANDRA-9549:
-

Sorry for the slow response. This one slipped off my work queue. I've pushed a 
fix [here|https://github.com/belliottsmith/cassandra/tree/9549]. The problem is 
that I made erroneous assumptions about the behaviour of CLQ on remove (I've 
read too many CLQ implementations to keep them all straight, I guess). The 
problem is that on remove, it does not unlink the node it has removed from, it 
only sets the item to null. This means we accumulate the CLQ nodes for the 
whole lifetime of the Ref (in this case an sstable). DTCS obviously exacerbates 
this by ensuring sstable lifetimes are infinite.

This patch simply swaps that to a CLDeque. This has some undesirable 
properties, so we should probably hasten CASSANDRA-9379. This would have 
prevented this, and will generally improve our management of Ref instances.

I've also filed a follow up ticket, CASSANDRA-9600, which would have mitigated 
this.


 Memory leak 
 

 Key: CASSANDRA-9549
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9549
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes, 
 2 cores 7.5G memory, 800G platter for cassandra data, root partition and 
 commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1 
 replica/zone
 JVM: /usr/java/jdk1.8.0_40/jre/bin/java 
 JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
 -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
 -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M 
 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
 -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
 -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler 
 -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled 
 -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=1 -XX:+UseCondCardMark 
 -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
 -Dcom.sun.management.jmxremote.rmi.port=7199 
 -Dcom.sun.management.jmxremote.ssl=false 
 -Dcom.sun.management.jmxremote.authenticate=false 
 -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra 
 -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid 
 Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
Reporter: Ivar Thorson
Priority: Critical
 Fix For: 2.1.x

 Attachments: c4_system.log, c7fromboot.zip, cassandra.yaml, 
 cpu-load.png, memoryuse.png, ref-java-errors.jpeg, suspect.png, two-loads.png


 We have been experiencing a severe memory leak with Cassandra 2.1.5 that, 
 over the period of a couple of days, eventually consumes all of the available 
 JVM heap space, putting the JVM into GC hell where it keeps trying CMS 
 collection but can't free up any heap space. This pattern happens for every 
 node in our cluster and is requiring rolling cassandra restarts just to keep 
 the cluster running. We have upgraded the cluster per Datastax docs from the 
 2.0 branch a couple of months ago and have been using the data from this 
 cluster for more than a year without problem.
 As the heap fills up with non-GC-able objects, the CPU/OS load average grows 
 along with it. Heap dumps reveal an increasing number of 
 java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps 
 over a 2 day period, and watched the number of Node objects go from 4M, to 
 19M, to 36M, and eventually about 65M objects before the node stops 
 responding. The screen capture of our heap dump is from the 19M measurement.
 Load on the cluster is minimal. We can see this effect even with only a 
 handful of writes per second. (See attachments for Opscenter snapshots during 
 very light loads and heavier loads). Even with only 5 reads a sec we see this 
 behavior.
 Log files show repeated errors in Ref.java:181 and Ref.java:279 and LEAK 
 detected messages:
 {code}
 ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error 
 when closing class 
 org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1302301946:/data1/data/ourtablegoeshere-ka-1150
 java.util.concurrent.RejectedExecutionException: Task 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@32680b31 
 rejected from 
 org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@573464d6[Terminated,
  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1644]
 {code}
 {code}
 ERROR 

[jira] [Commented] (CASSANDRA-9549) Memory leak

2015-06-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586556#comment-14586556
 ] 

Benedict commented on CASSANDRA-9549:
-

Actually, scratch that... it does look like CLQ should remove the node. And 
yet, it isn't doing so, if the heap dump is to be believed. I suspect the 
patched branch will fix the problem, but will see if I can puzzle out a 
plausible mechanism by which the nodes are accumulating.

 Memory leak 
 

 Key: CASSANDRA-9549
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9549
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes, 
 2 cores 7.5G memory, 800G platter for cassandra data, root partition and 
 commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1 
 replica/zone
 JVM: /usr/java/jdk1.8.0_40/jre/bin/java 
 JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
 -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
 -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M 
 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
 -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
 -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler 
 -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled 
 -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=1 -XX:+UseCondCardMark 
 -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
 -Dcom.sun.management.jmxremote.rmi.port=7199 
 -Dcom.sun.management.jmxremote.ssl=false 
 -Dcom.sun.management.jmxremote.authenticate=false 
 -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra 
 -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid 
 Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
Reporter: Ivar Thorson
Priority: Critical
 Fix For: 2.1.x

 Attachments: c4_system.log, c7fromboot.zip, cassandra.yaml, 
 cpu-load.png, memoryuse.png, ref-java-errors.jpeg, suspect.png, two-loads.png


 We have been experiencing a severe memory leak with Cassandra 2.1.5 that, 
 over the period of a couple of days, eventually consumes all of the available 
 JVM heap space, putting the JVM into GC hell where it keeps trying CMS 
 collection but can't free up any heap space. This pattern happens for every 
 node in our cluster and is requiring rolling cassandra restarts just to keep 
 the cluster running. We have upgraded the cluster per Datastax docs from the 
 2.0 branch a couple of months ago and have been using the data from this 
 cluster for more than a year without problem.
 As the heap fills up with non-GC-able objects, the CPU/OS load average grows 
 along with it. Heap dumps reveal an increasing number of 
 java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps 
 over a 2 day period, and watched the number of Node objects go from 4M, to 
 19M, to 36M, and eventually about 65M objects before the node stops 
 responding. The screen capture of our heap dump is from the 19M measurement.
 Load on the cluster is minimal. We can see this effect even with only a 
 handful of writes per second. (See attachments for Opscenter snapshots during 
 very light loads and heavier loads). Even with only 5 reads a sec we see this 
 behavior.
 Log files show repeated errors in Ref.java:181 and Ref.java:279 and LEAK 
 detected messages:
 {code}
 ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error 
 when closing class 
 org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1302301946:/data1/data/ourtablegoeshere-ka-1150
 java.util.concurrent.RejectedExecutionException: Task 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@32680b31 
 rejected from 
 org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@573464d6[Terminated,
  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1644]
 {code}
 {code}
 ERROR [Reference-Reaper:1] 2015-06-01 18:27:37,083 Ref.java:181 - LEAK 
 DETECTED: a reference 
 (org.apache.cassandra.utils.concurrent.Ref$State@74b5df92) to class 
 org.apache.cassandra.io.sstable.SSTableReader$DescriptorTypeTidy@2054303604:/data2/data/ourtablegoeshere-ka-1151
  was not released before the reference was garbage collected
 {code}
 This might be related to [CASSANDRA-8723]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9549) Memory leak

2015-06-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586587#comment-14586587
 ] 

Benedict commented on CASSANDRA-9549:
-

Ahhh. So, there is a pathological case in CLQ.remove. If the item you delete 
was the last to be inserted, it will not expunge the node. However if it also 
does not expunge any deleted items en route to the end. So, if you retain the 
first to be inserted, and you always delete the last, you get an infinitely 
growing, but completely empty, middle of the CLQ. This is pretty easily 
avoided, so might be worth an upstream patch to the JDK. However for now the 
patch I uploaded should fix the problem (which I'm more confident of, now there 
is an explanatory framework), and CASSANDRA-9379 remains the correct follow up 
to ensure no pathological list behaviours (e.g. with lots of extant Ref 
instances).

 Memory leak 
 

 Key: CASSANDRA-9549
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9549
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes, 
 2 cores 7.5G memory, 800G platter for cassandra data, root partition and 
 commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1 
 replica/zone
 JVM: /usr/java/jdk1.8.0_40/jre/bin/java 
 JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
 -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
 -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M 
 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
 -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
 -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler 
 -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled 
 -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=1 -XX:+UseCondCardMark 
 -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
 -Dcom.sun.management.jmxremote.rmi.port=7199 
 -Dcom.sun.management.jmxremote.ssl=false 
 -Dcom.sun.management.jmxremote.authenticate=false 
 -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra 
 -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid 
 Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
Reporter: Ivar Thorson
Priority: Critical
 Fix For: 2.1.x

 Attachments: c4_system.log, c7fromboot.zip, cassandra.yaml, 
 cpu-load.png, memoryuse.png, ref-java-errors.jpeg, suspect.png, two-loads.png


 We have been experiencing a severe memory leak with Cassandra 2.1.5 that, 
 over the period of a couple of days, eventually consumes all of the available 
 JVM heap space, putting the JVM into GC hell where it keeps trying CMS 
 collection but can't free up any heap space. This pattern happens for every 
 node in our cluster and is requiring rolling cassandra restarts just to keep 
 the cluster running. We have upgraded the cluster per Datastax docs from the 
 2.0 branch a couple of months ago and have been using the data from this 
 cluster for more than a year without problem.
 As the heap fills up with non-GC-able objects, the CPU/OS load average grows 
 along with it. Heap dumps reveal an increasing number of 
 java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps 
 over a 2 day period, and watched the number of Node objects go from 4M, to 
 19M, to 36M, and eventually about 65M objects before the node stops 
 responding. The screen capture of our heap dump is from the 19M measurement.
 Load on the cluster is minimal. We can see this effect even with only a 
 handful of writes per second. (See attachments for Opscenter snapshots during 
 very light loads and heavier loads). Even with only 5 reads a sec we see this 
 behavior.
 Log files show repeated errors in Ref.java:181 and Ref.java:279 and LEAK 
 detected messages:
 {code}
 ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error 
 when closing class 
 org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1302301946:/data1/data/ourtablegoeshere-ka-1150
 java.util.concurrent.RejectedExecutionException: Task 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@32680b31 
 rejected from 
 org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@573464d6[Terminated,
  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1644]
 {code}
 {code}
 ERROR [Reference-Reaper:1] 2015-06-01 18:27:37,083 Ref.java:181 - LEAK 
 DETECTED: a reference 
 (org.apache.cassandra.utils.concurrent.Ref$State@74b5df92) to class 
 

[jira] [Commented] (CASSANDRA-9549) Memory leak

2015-06-08 Thread Ivar Thorson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577420#comment-14577420
 ] 

Ivar Thorson commented on CASSANDRA-9549:
-

https://drive.google.com/a/whibse.com/file/d/0BxS4YrlxXzqAODNaTHBqY2ZGZlE/view?usp=sharing

 Memory leak 
 

 Key: CASSANDRA-9549
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9549
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes, 
 2 cores 7.5G memory, 800G platter for cassandra data, root partition and 
 commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1 
 replica/zone
 JVM: /usr/java/jdk1.8.0_40/jre/bin/java 
 JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
 -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
 -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M 
 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
 -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
 -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler 
 -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled 
 -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=1 -XX:+UseCondCardMark 
 -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
 -Dcom.sun.management.jmxremote.rmi.port=7199 
 -Dcom.sun.management.jmxremote.ssl=false 
 -Dcom.sun.management.jmxremote.authenticate=false 
 -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra 
 -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid 
 Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
Reporter: Ivar Thorson
Priority: Critical
 Fix For: 2.1.x

 Attachments: c4_system.log, c7fromboot.zip, cassandra.yaml, 
 cpu-load.png, memoryuse.png, ref-java-errors.jpeg, suspect.png, two-loads.png


 We have been experiencing a severe memory leak with Cassandra 2.1.5 that, 
 over the period of a couple of days, eventually consumes all of the available 
 JVM heap space, putting the JVM into GC hell where it keeps trying CMS 
 collection but can't free up any heap space. This pattern happens for every 
 node in our cluster and is requiring rolling cassandra restarts just to keep 
 the cluster running. We have upgraded the cluster per Datastax docs from the 
 2.0 branch a couple of months ago and have been using the data from this 
 cluster for more than a year without problem.
 As the heap fills up with non-GC-able objects, the CPU/OS load average grows 
 along with it. Heap dumps reveal an increasing number of 
 java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps 
 over a 2 day period, and watched the number of Node objects go from 4M, to 
 19M, to 36M, and eventually about 65M objects before the node stops 
 responding. The screen capture of our heap dump is from the 19M measurement.
 Load on the cluster is minimal. We can see this effect even with only a 
 handful of writes per second. (See attachments for Opscenter snapshots during 
 very light loads and heavier loads). Even with only 5 reads a sec we see this 
 behavior.
 Log files show repeated errors in Ref.java:181 and Ref.java:279 and LEAK 
 detected messages:
 {code}
 ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error 
 when closing class 
 org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1302301946:/data1/data/ourtablegoeshere-ka-1150
 java.util.concurrent.RejectedExecutionException: Task 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@32680b31 
 rejected from 
 org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@573464d6[Terminated,
  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1644]
 {code}
 {code}
 ERROR [Reference-Reaper:1] 2015-06-01 18:27:37,083 Ref.java:181 - LEAK 
 DETECTED: a reference 
 (org.apache.cassandra.utils.concurrent.Ref$State@74b5df92) to class 
 org.apache.cassandra.io.sstable.SSTableReader$DescriptorTypeTidy@2054303604:/data2/data/ourtablegoeshere-ka-1151
  was not released before the reference was garbage collected
 {code}
 This might be related to [CASSANDRA-8723]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9549) Memory leak

2015-06-06 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575682#comment-14575682
 ] 

Benedict commented on CASSANDRA-9549:
-

Wherever is convenient for you to put it.

 Memory leak 
 

 Key: CASSANDRA-9549
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9549
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes, 
 2 cores 7.5G memory, 800G platter for cassandra data, root partition and 
 commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1 
 replica/zone
 JVM: /usr/java/jdk1.8.0_40/jre/bin/java 
 JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
 -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
 -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M 
 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
 -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
 -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler 
 -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled 
 -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=1 -XX:+UseCondCardMark 
 -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
 -Dcom.sun.management.jmxremote.rmi.port=7199 
 -Dcom.sun.management.jmxremote.ssl=false 
 -Dcom.sun.management.jmxremote.authenticate=false 
 -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra 
 -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid 
 Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
Reporter: Ivar Thorson
Priority: Critical
 Fix For: 2.1.x

 Attachments: c4_system.log, c7fromboot.zip, cassandra.yaml, 
 cpu-load.png, memoryuse.png, ref-java-errors.jpeg, suspect.png, two-loads.png


 We have been experiencing a severe memory leak with Cassandra 2.1.5 that, 
 over the period of a couple of days, eventually consumes all of the available 
 JVM heap space, putting the JVM into GC hell where it keeps trying CMS 
 collection but can't free up any heap space. This pattern happens for every 
 node in our cluster and is requiring rolling cassandra restarts just to keep 
 the cluster running. We have upgraded the cluster per Datastax docs from the 
 2.0 branch a couple of months ago and have been using the data from this 
 cluster for more than a year without problem.
 As the heap fills up with non-GC-able objects, the CPU/OS load average grows 
 along with it. Heap dumps reveal an increasing number of 
 java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps 
 over a 2 day period, and watched the number of Node objects go from 4M, to 
 19M, to 36M, and eventually about 65M objects before the node stops 
 responding. The screen capture of our heap dump is from the 19M measurement.
 Load on the cluster is minimal. We can see this effect even with only a 
 handful of writes per second. (See attachments for Opscenter snapshots during 
 very light loads and heavier loads). Even with only 5 reads a sec we see this 
 behavior.
 Log files show repeated errors in Ref.java:181 and Ref.java:279 and LEAK 
 detected messages:
 {code}
 ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error 
 when closing class 
 org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1302301946:/data1/data/ourtablegoeshere-ka-1150
 java.util.concurrent.RejectedExecutionException: Task 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@32680b31 
 rejected from 
 org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@573464d6[Terminated,
  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1644]
 {code}
 {code}
 ERROR [Reference-Reaper:1] 2015-06-01 18:27:37,083 Ref.java:181 - LEAK 
 DETECTED: a reference 
 (org.apache.cassandra.utils.concurrent.Ref$State@74b5df92) to class 
 org.apache.cassandra.io.sstable.SSTableReader$DescriptorTypeTidy@2054303604:/data2/data/ourtablegoeshere-ka-1151
  was not released before the reference was garbage collected
 {code}
 This might be related to [CASSANDRA-8723]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9549) Memory leak

2015-06-05 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574121#comment-14574121
 ] 

Benedict commented on CASSANDRA-9549:
-

That still seems to be missing the usual startup log messages, and must have 
been running for some time since the CompactionExecutor and MemtableFlusher 
pools both have thread ids above 100. 

It looks like it is already under significant heap pressure at that time. 
Unfortunately it is very hard to say why, likely even with the complete logs. 
At this point we really need a heap dump to analyse.

 Memory leak 
 

 Key: CASSANDRA-9549
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9549
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes, 
 2 cores 7.5G memory, 800G platter for cassandra data, root partition and 
 commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1 
 replica/zone
 JVM: /usr/java/jdk1.8.0_40/jre/bin/java 
 JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
 -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
 -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M 
 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
 -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
 -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler 
 -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled 
 -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=1 -XX:+UseCondCardMark 
 -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
 -Dcom.sun.management.jmxremote.rmi.port=7199 
 -Dcom.sun.management.jmxremote.ssl=false 
 -Dcom.sun.management.jmxremote.authenticate=false 
 -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra 
 -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid 
 Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
Reporter: Ivar Thorson
Priority: Critical
 Fix For: 2.1.x

 Attachments: c4_system.log, c7-system-fromboot.zip, cassandra.yaml, 
 cpu-load.png, memoryuse.png, ref-java-errors.jpeg, suspect.png, two-loads.png


 We have been experiencing a severe memory leak with Cassandra 2.1.5 that, 
 over the period of a couple of days, eventually consumes all of the available 
 JVM heap space, putting the JVM into GC hell where it keeps trying CMS 
 collection but can't free up any heap space. This pattern happens for every 
 node in our cluster and is requiring rolling cassandra restarts just to keep 
 the cluster running. We have upgraded the cluster per Datastax docs from the 
 2.0 branch a couple of months ago and have been using the data from this 
 cluster for more than a year without problem.
 As the heap fills up with non-GC-able objects, the CPU/OS load average grows 
 along with it. Heap dumps reveal an increasing number of 
 java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps 
 over a 2 day period, and watched the number of Node objects go from 4M, to 
 19M, to 36M, and eventually about 65M objects before the node stops 
 responding. The screen capture of our heap dump is from the 19M measurement.
 Load on the cluster is minimal. We can see this effect even with only a 
 handful of writes per second. (See attachments for Opscenter snapshots during 
 very light loads and heavier loads). Even with only 5 reads a sec we see this 
 behavior.
 Log files show repeated errors in Ref.java:181 and Ref.java:279 and LEAK 
 detected messages:
 {code}
 ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error 
 when closing class 
 org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1302301946:/data1/data/ourtablegoeshere-ka-1150
 java.util.concurrent.RejectedExecutionException: Task 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@32680b31 
 rejected from 
 org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@573464d6[Terminated,
  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1644]
 {code}
 {code}
 ERROR [Reference-Reaper:1] 2015-06-01 18:27:37,083 Ref.java:181 - LEAK 
 DETECTED: a reference 
 (org.apache.cassandra.utils.concurrent.Ref$State@74b5df92) to class 
 org.apache.cassandra.io.sstable.SSTableReader$DescriptorTypeTidy@2054303604:/data2/data/ourtablegoeshere-ka-1151
  was not released before the reference was garbage collected
 {code}
 This might be related to [CASSANDRA-8723]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9549) Memory leak

2015-06-05 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574905#comment-14574905
 ] 

Joshua McKenzie commented on CASSANDRA-9549:


CASSANDRA-8092 is still open. We have quite a few more swallowed exceptions 
since last I went through the code-base and fixed them:

{noformat}
Total caught and rethrown as something other than Runtime: 82
Total caught and rethrown as Runtime: 68
Total Swallowed: 40
Total delegated to JVMStabilityInspector: 66
Total 'catch (Throwable ...)' analyzed: 79
Total 'catch (Exception ...)' analyzed: 177
Total catch clauses analyzed: 256
{noformat}

So in this instance, I wouldn't bank on the shutdown hook having been 
unregistered.

 Memory leak 
 

 Key: CASSANDRA-9549
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9549
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes, 
 2 cores 7.5G memory, 800G platter for cassandra data, root partition and 
 commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1 
 replica/zone
 JVM: /usr/java/jdk1.8.0_40/jre/bin/java 
 JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
 -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
 -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M 
 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
 -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
 -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler 
 -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled 
 -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=1 -XX:+UseCondCardMark 
 -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
 -Dcom.sun.management.jmxremote.rmi.port=7199 
 -Dcom.sun.management.jmxremote.ssl=false 
 -Dcom.sun.management.jmxremote.authenticate=false 
 -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra 
 -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid 
 Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
Reporter: Ivar Thorson
Priority: Critical
 Fix For: 2.1.x

 Attachments: c4_system.log, c7fromboot.zip, cassandra.yaml, 
 cpu-load.png, memoryuse.png, ref-java-errors.jpeg, suspect.png, two-loads.png


 We have been experiencing a severe memory leak with Cassandra 2.1.5 that, 
 over the period of a couple of days, eventually consumes all of the available 
 JVM heap space, putting the JVM into GC hell where it keeps trying CMS 
 collection but can't free up any heap space. This pattern happens for every 
 node in our cluster and is requiring rolling cassandra restarts just to keep 
 the cluster running. We have upgraded the cluster per Datastax docs from the 
 2.0 branch a couple of months ago and have been using the data from this 
 cluster for more than a year without problem.
 As the heap fills up with non-GC-able objects, the CPU/OS load average grows 
 along with it. Heap dumps reveal an increasing number of 
 java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps 
 over a 2 day period, and watched the number of Node objects go from 4M, to 
 19M, to 36M, and eventually about 65M objects before the node stops 
 responding. The screen capture of our heap dump is from the 19M measurement.
 Load on the cluster is minimal. We can see this effect even with only a 
 handful of writes per second. (See attachments for Opscenter snapshots during 
 very light loads and heavier loads). Even with only 5 reads a sec we see this 
 behavior.
 Log files show repeated errors in Ref.java:181 and Ref.java:279 and LEAK 
 detected messages:
 {code}
 ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error 
 when closing class 
 org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1302301946:/data1/data/ourtablegoeshere-ka-1150
 java.util.concurrent.RejectedExecutionException: Task 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@32680b31 
 rejected from 
 org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@573464d6[Terminated,
  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1644]
 {code}
 {code}
 ERROR [Reference-Reaper:1] 2015-06-01 18:27:37,083 Ref.java:181 - LEAK 
 DETECTED: a reference 
 (org.apache.cassandra.utils.concurrent.Ref$State@74b5df92) to class 
 org.apache.cassandra.io.sstable.SSTableReader$DescriptorTypeTidy@2054303604:/data2/data/ourtablegoeshere-ka-1151
  was not released before the reference was garbage collected
 {code}
 This might be related to 

[jira] [Commented] (CASSANDRA-9549) Memory leak

2015-06-05 Thread Ivar Thorson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574811#comment-14574811
 ] 

Ivar Thorson commented on CASSANDRA-9549:
-

Sorry, I had difficulty figuring out where the log starts because I've been 
working from a large, concatenated file, and keep mixing UTC and PST time 
zones. I uploaded c7fromboot.zip, which seems to start from the right place.

 Memory leak 
 

 Key: CASSANDRA-9549
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9549
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes, 
 2 cores 7.5G memory, 800G platter for cassandra data, root partition and 
 commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1 
 replica/zone
 JVM: /usr/java/jdk1.8.0_40/jre/bin/java 
 JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
 -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
 -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M 
 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
 -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
 -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler 
 -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled 
 -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=1 -XX:+UseCondCardMark 
 -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
 -Dcom.sun.management.jmxremote.rmi.port=7199 
 -Dcom.sun.management.jmxremote.ssl=false 
 -Dcom.sun.management.jmxremote.authenticate=false 
 -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra 
 -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid 
 Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
Reporter: Ivar Thorson
Priority: Critical
 Fix For: 2.1.x

 Attachments: c4_system.log, c7fromboot.zip, cassandra.yaml, 
 cpu-load.png, memoryuse.png, ref-java-errors.jpeg, suspect.png, two-loads.png


 We have been experiencing a severe memory leak with Cassandra 2.1.5 that, 
 over the period of a couple of days, eventually consumes all of the available 
 JVM heap space, putting the JVM into GC hell where it keeps trying CMS 
 collection but can't free up any heap space. This pattern happens for every 
 node in our cluster and is requiring rolling cassandra restarts just to keep 
 the cluster running. We have upgraded the cluster per Datastax docs from the 
 2.0 branch a couple of months ago and have been using the data from this 
 cluster for more than a year without problem.
 As the heap fills up with non-GC-able objects, the CPU/OS load average grows 
 along with it. Heap dumps reveal an increasing number of 
 java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps 
 over a 2 day period, and watched the number of Node objects go from 4M, to 
 19M, to 36M, and eventually about 65M objects before the node stops 
 responding. The screen capture of our heap dump is from the 19M measurement.
 Load on the cluster is minimal. We can see this effect even with only a 
 handful of writes per second. (See attachments for Opscenter snapshots during 
 very light loads and heavier loads). Even with only 5 reads a sec we see this 
 behavior.
 Log files show repeated errors in Ref.java:181 and Ref.java:279 and LEAK 
 detected messages:
 {code}
 ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error 
 when closing class 
 org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1302301946:/data1/data/ourtablegoeshere-ka-1150
 java.util.concurrent.RejectedExecutionException: Task 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@32680b31 
 rejected from 
 org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@573464d6[Terminated,
  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1644]
 {code}
 {code}
 ERROR [Reference-Reaper:1] 2015-06-01 18:27:37,083 Ref.java:181 - LEAK 
 DETECTED: a reference 
 (org.apache.cassandra.utils.concurrent.Ref$State@74b5df92) to class 
 org.apache.cassandra.io.sstable.SSTableReader$DescriptorTypeTidy@2054303604:/data2/data/ourtablegoeshere-ka-1151
  was not released before the reference was garbage collected
 {code}
 This might be related to [CASSANDRA-8723]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9549) Memory leak

2015-06-05 Thread Ivar Thorson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575430#comment-14575430
 ] 

Ivar Thorson commented on CASSANDRA-9549:
-

I'd be happy to provide a heap dump, but even zipped it's 200MB. FTP? Google 
drive?

 Memory leak 
 

 Key: CASSANDRA-9549
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9549
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes, 
 2 cores 7.5G memory, 800G platter for cassandra data, root partition and 
 commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1 
 replica/zone
 JVM: /usr/java/jdk1.8.0_40/jre/bin/java 
 JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
 -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
 -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M 
 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
 -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
 -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler 
 -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled 
 -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=1 -XX:+UseCondCardMark 
 -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
 -Dcom.sun.management.jmxremote.rmi.port=7199 
 -Dcom.sun.management.jmxremote.ssl=false 
 -Dcom.sun.management.jmxremote.authenticate=false 
 -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra 
 -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid 
 Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
Reporter: Ivar Thorson
Priority: Critical
 Fix For: 2.1.x

 Attachments: c4_system.log, c7fromboot.zip, cassandra.yaml, 
 cpu-load.png, memoryuse.png, ref-java-errors.jpeg, suspect.png, two-loads.png


 We have been experiencing a severe memory leak with Cassandra 2.1.5 that, 
 over the period of a couple of days, eventually consumes all of the available 
 JVM heap space, putting the JVM into GC hell where it keeps trying CMS 
 collection but can't free up any heap space. This pattern happens for every 
 node in our cluster and is requiring rolling cassandra restarts just to keep 
 the cluster running. We have upgraded the cluster per Datastax docs from the 
 2.0 branch a couple of months ago and have been using the data from this 
 cluster for more than a year without problem.
 As the heap fills up with non-GC-able objects, the CPU/OS load average grows 
 along with it. Heap dumps reveal an increasing number of 
 java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps 
 over a 2 day period, and watched the number of Node objects go from 4M, to 
 19M, to 36M, and eventually about 65M objects before the node stops 
 responding. The screen capture of our heap dump is from the 19M measurement.
 Load on the cluster is minimal. We can see this effect even with only a 
 handful of writes per second. (See attachments for Opscenter snapshots during 
 very light loads and heavier loads). Even with only 5 reads a sec we see this 
 behavior.
 Log files show repeated errors in Ref.java:181 and Ref.java:279 and LEAK 
 detected messages:
 {code}
 ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error 
 when closing class 
 org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1302301946:/data1/data/ourtablegoeshere-ka-1150
 java.util.concurrent.RejectedExecutionException: Task 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@32680b31 
 rejected from 
 org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@573464d6[Terminated,
  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1644]
 {code}
 {code}
 ERROR [Reference-Reaper:1] 2015-06-01 18:27:37,083 Ref.java:181 - LEAK 
 DETECTED: a reference 
 (org.apache.cassandra.utils.concurrent.Ref$State@74b5df92) to class 
 org.apache.cassandra.io.sstable.SSTableReader$DescriptorTypeTidy@2054303604:/data2/data/ourtablegoeshere-ka-1151
  was not released before the reference was garbage collected
 {code}
 This might be related to [CASSANDRA-8723]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9549) Memory leak

2015-06-04 Thread Ivar Thorson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573595#comment-14573595
 ] 

Ivar Thorson commented on CASSANDRA-9549:
-

We have tried increasing JVM heap size slightly to 3G, but we see the same 
issues. We cannot increase the heap size much more before reaching an 
unreasonably large fraction of total system memory (7.5G). We are not doing 
extensive deletions or overwrites. 

The log exceeds 20M when compressed, try to cut that down a bit and find the 
start point.

 Memory leak 
 

 Key: CASSANDRA-9549
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9549
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes, 
 2 cores 7.5G memory, 800G platter for cassandra data, root partition and 
 commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1 
 replica/zone
 JVM: /usr/java/jdk1.8.0_40/jre/bin/java 
 JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
 -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
 -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M 
 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
 -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
 -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler 
 -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled 
 -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=1 -XX:+UseCondCardMark 
 -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
 -Dcom.sun.management.jmxremote.rmi.port=7199 
 -Dcom.sun.management.jmxremote.ssl=false 
 -Dcom.sun.management.jmxremote.authenticate=false 
 -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra 
 -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid 
 Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
Reporter: Ivar Thorson
Priority: Critical
 Fix For: 2.1.x

 Attachments: c4_system.log, cassandra.yaml, cpu-load.png, 
 memoryuse.png, ref-java-errors.jpeg, suspect.png, two-loads.png


 We have been experiencing a severe memory leak with Cassandra 2.1.5 that, 
 over the period of a couple of days, eventually consumes all of the available 
 JVM heap space, putting the JVM into GC hell where it keeps trying CMS 
 collection but can't free up any heap space. This pattern happens for every 
 node in our cluster and is requiring rolling cassandra restarts just to keep 
 the cluster running. We have upgraded the cluster per Datastax docs from the 
 2.0 branch a couple of months ago and have been using the data from this 
 cluster for more than a year without problem.
 As the heap fills up with non-GC-able objects, the CPU/OS load average grows 
 along with it. Heap dumps reveal an increasing number of 
 java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps 
 over a 2 day period, and watched the number of Node objects go from 4M, to 
 19M, to 36M, and eventually about 65M objects before the node stops 
 responding. The screen capture of our heap dump is from the 19M measurement.
 Load on the cluster is minimal. We can see this effect even with only a 
 handful of writes per second. (See attachments for Opscenter snapshots during 
 very light loads and heavier loads). Even with only 5 reads a sec we see this 
 behavior.
 Log files show repeated errors in Ref.java:181 and Ref.java:279 and LEAK 
 detected messages:
 {code}
 ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error 
 when closing class 
 org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1302301946:/data1/data/ourtablegoeshere-ka-1150
 java.util.concurrent.RejectedExecutionException: Task 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@32680b31 
 rejected from 
 org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@573464d6[Terminated,
  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1644]
 {code}
 {code}
 ERROR [Reference-Reaper:1] 2015-06-01 18:27:37,083 Ref.java:181 - LEAK 
 DETECTED: a reference 
 (org.apache.cassandra.utils.concurrent.Ref$State@74b5df92) to class 
 org.apache.cassandra.io.sstable.SSTableReader$DescriptorTypeTidy@2054303604:/data2/data/ourtablegoeshere-ka-1151
  was not released before the reference was garbage collected
 {code}
 This might be related to [CASSANDRA-8723]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9549) Memory leak

2015-06-04 Thread Ivar Thorson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573528#comment-14573528
 ] 

Ivar Thorson commented on CASSANDRA-9549:
-

Log file uploaded. We're running the datastax rpms and restarting with service 
cassandra restart

 Memory leak 
 

 Key: CASSANDRA-9549
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9549
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes, 
 2 cores 7.5G memory, 800G platter for cassandra data, root partition and 
 commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1 
 replica/zone
 JVM: /usr/java/jdk1.8.0_40/jre/bin/java 
 JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
 -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
 -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M 
 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
 -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
 -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler 
 -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled 
 -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=1 -XX:+UseCondCardMark 
 -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
 -Dcom.sun.management.jmxremote.rmi.port=7199 
 -Dcom.sun.management.jmxremote.ssl=false 
 -Dcom.sun.management.jmxremote.authenticate=false 
 -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra 
 -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid 
 Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
Reporter: Ivar Thorson
Priority: Critical
 Fix For: 2.1.x

 Attachments: c4_system.log, cassandra.yaml, cpu-load.png, 
 memoryuse.png, ref-java-errors.jpeg, suspect.png, two-loads.png


 We have been experiencing a severe memory leak with Cassandra 2.1.5 that, 
 over the period of a couple of days, eventually consumes all of the available 
 JVM heap space, putting the JVM into GC hell where it keeps trying CMS 
 collection but can't free up any heap space. This pattern happens for every 
 node in our cluster and is requiring rolling cassandra restarts just to keep 
 the cluster running. We have upgraded the cluster per Datastax docs from the 
 2.0 branch a couple of months ago and have been using the data from this 
 cluster for more than a year without problem.
 As the heap fills up with non-GC-able objects, the CPU/OS load average grows 
 along with it. Heap dumps reveal an increasing number of 
 java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps 
 over a 2 day period, and watched the number of Node objects go from 4M, to 
 19M, to 36M, and eventually about 65M objects before the node stops 
 responding. The screen capture of our heap dump is from the 19M measurement.
 Load on the cluster is minimal. We can see this effect even with only a 
 handful of writes per second. (See attachments for Opscenter snapshots during 
 very light loads and heavier loads). Even with only 5 reads a sec we see this 
 behavior.
 Log files show repeated errors in Ref.java:181 and Ref.java:279 and LEAK 
 detected messages:
 {code}
 ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error 
 when closing class 
 org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1302301946:/data1/data/ourtablegoeshere-ka-1150
 java.util.concurrent.RejectedExecutionException: Task 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@32680b31 
 rejected from 
 org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@573464d6[Terminated,
  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1644]
 {code}
 {code}
 ERROR [Reference-Reaper:1] 2015-06-01 18:27:37,083 Ref.java:181 - LEAK 
 DETECTED: a reference 
 (org.apache.cassandra.utils.concurrent.Ref$State@74b5df92) to class 
 org.apache.cassandra.io.sstable.SSTableReader$DescriptorTypeTidy@2054303604:/data2/data/ourtablegoeshere-ka-1151
  was not released before the reference was garbage collected
 {code}
 This might be related to [CASSANDRA-8723]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9549) Memory leak

2015-06-04 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573538#comment-14573538
 ] 

Benedict commented on CASSANDRA-9549:
-

Thanks.

This error specifically is related to that change, but the underlying cause is 
most likely not. With the full log file we can probably glean enough 
information to suppress this _presentation_ of the problem, but the service 
would still be shutdown while the system is running and this would eventually 
lead to other problems.



 Memory leak 
 

 Key: CASSANDRA-9549
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9549
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes, 
 2 cores 7.5G memory, 800G platter for cassandra data, root partition and 
 commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1 
 replica/zone
 JVM: /usr/java/jdk1.8.0_40/jre/bin/java 
 JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
 -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
 -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M 
 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
 -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
 -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler 
 -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled 
 -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=1 -XX:+UseCondCardMark 
 -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
 -Dcom.sun.management.jmxremote.rmi.port=7199 
 -Dcom.sun.management.jmxremote.ssl=false 
 -Dcom.sun.management.jmxremote.authenticate=false 
 -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra 
 -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid 
 Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
Reporter: Ivar Thorson
Priority: Critical
 Fix For: 2.1.x

 Attachments: c4_system.log, cassandra.yaml, cpu-load.png, 
 memoryuse.png, ref-java-errors.jpeg, suspect.png, two-loads.png


 We have been experiencing a severe memory leak with Cassandra 2.1.5 that, 
 over the period of a couple of days, eventually consumes all of the available 
 JVM heap space, putting the JVM into GC hell where it keeps trying CMS 
 collection but can't free up any heap space. This pattern happens for every 
 node in our cluster and is requiring rolling cassandra restarts just to keep 
 the cluster running. We have upgraded the cluster per Datastax docs from the 
 2.0 branch a couple of months ago and have been using the data from this 
 cluster for more than a year without problem.
 As the heap fills up with non-GC-able objects, the CPU/OS load average grows 
 along with it. Heap dumps reveal an increasing number of 
 java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps 
 over a 2 day period, and watched the number of Node objects go from 4M, to 
 19M, to 36M, and eventually about 65M objects before the node stops 
 responding. The screen capture of our heap dump is from the 19M measurement.
 Load on the cluster is minimal. We can see this effect even with only a 
 handful of writes per second. (See attachments for Opscenter snapshots during 
 very light loads and heavier loads). Even with only 5 reads a sec we see this 
 behavior.
 Log files show repeated errors in Ref.java:181 and Ref.java:279 and LEAK 
 detected messages:
 {code}
 ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error 
 when closing class 
 org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1302301946:/data1/data/ourtablegoeshere-ka-1150
 java.util.concurrent.RejectedExecutionException: Task 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@32680b31 
 rejected from 
 org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@573464d6[Terminated,
  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1644]
 {code}
 {code}
 ERROR [Reference-Reaper:1] 2015-06-01 18:27:37,083 Ref.java:181 - LEAK 
 DETECTED: a reference 
 (org.apache.cassandra.utils.concurrent.Ref$State@74b5df92) to class 
 org.apache.cassandra.io.sstable.SSTableReader$DescriptorTypeTidy@2054303604:/data2/data/ourtablegoeshere-ka-1151
  was not released before the reference was garbage collected
 {code}
 This might be related to [CASSANDRA-8723]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9549) Memory leak

2015-06-04 Thread Ivar Thorson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573640#comment-14573640
 ] 

Ivar Thorson commented on CASSANDRA-9549:
-

Uploaded a new log for our c7 node, after spending time finding when the node 
was last restarted. Let me know if I am still truncating the log at the wrong 
points.

 Memory leak 
 

 Key: CASSANDRA-9549
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9549
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes, 
 2 cores 7.5G memory, 800G platter for cassandra data, root partition and 
 commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1 
 replica/zone
 JVM: /usr/java/jdk1.8.0_40/jre/bin/java 
 JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
 -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
 -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M 
 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
 -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
 -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler 
 -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled 
 -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=1 -XX:+UseCondCardMark 
 -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
 -Dcom.sun.management.jmxremote.rmi.port=7199 
 -Dcom.sun.management.jmxremote.ssl=false 
 -Dcom.sun.management.jmxremote.authenticate=false 
 -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra 
 -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid 
 Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
Reporter: Ivar Thorson
Priority: Critical
 Fix For: 2.1.x

 Attachments: c4_system.log, c7-system-fromboot.zip, cassandra.yaml, 
 cpu-load.png, memoryuse.png, ref-java-errors.jpeg, suspect.png, two-loads.png


 We have been experiencing a severe memory leak with Cassandra 2.1.5 that, 
 over the period of a couple of days, eventually consumes all of the available 
 JVM heap space, putting the JVM into GC hell where it keeps trying CMS 
 collection but can't free up any heap space. This pattern happens for every 
 node in our cluster and is requiring rolling cassandra restarts just to keep 
 the cluster running. We have upgraded the cluster per Datastax docs from the 
 2.0 branch a couple of months ago and have been using the data from this 
 cluster for more than a year without problem.
 As the heap fills up with non-GC-able objects, the CPU/OS load average grows 
 along with it. Heap dumps reveal an increasing number of 
 java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps 
 over a 2 day period, and watched the number of Node objects go from 4M, to 
 19M, to 36M, and eventually about 65M objects before the node stops 
 responding. The screen capture of our heap dump is from the 19M measurement.
 Load on the cluster is minimal. We can see this effect even with only a 
 handful of writes per second. (See attachments for Opscenter snapshots during 
 very light loads and heavier loads). Even with only 5 reads a sec we see this 
 behavior.
 Log files show repeated errors in Ref.java:181 and Ref.java:279 and LEAK 
 detected messages:
 {code}
 ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error 
 when closing class 
 org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1302301946:/data1/data/ourtablegoeshere-ka-1150
 java.util.concurrent.RejectedExecutionException: Task 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@32680b31 
 rejected from 
 org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@573464d6[Terminated,
  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1644]
 {code}
 {code}
 ERROR [Reference-Reaper:1] 2015-06-01 18:27:37,083 Ref.java:181 - LEAK 
 DETECTED: a reference 
 (org.apache.cassandra.utils.concurrent.Ref$State@74b5df92) to class 
 org.apache.cassandra.io.sstable.SSTableReader$DescriptorTypeTidy@2054303604:/data2/data/ourtablegoeshere-ka-1151
  was not released before the reference was garbage collected
 {code}
 This might be related to [CASSANDRA-8723]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9549) Memory leak

2015-06-04 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573567#comment-14573567
 ] 

Benedict commented on CASSANDRA-9549:
-

Thanks. Unfortunately that does not seem to be the complete log history. It 
would help a great deal to have logs from when the node actually started up.

I can make an educated guess, though: it looks like the node was OOMing due to 
normal operational reasons (or perhaps some other issue, we cannot say), and we 
recently modified behaviour in this scenario to trigger a shutdown of the host. 
Unfortunately, it seems that the OOM is somehow delaying the shutdown from 
completing, or perhaps there is some other issue. Certainly the JVM thinks it 
is shutting down.

The strange thing is that the shutdown hook must still have been run, since 
that is the only way the executor service could be shutdown, only we ask the 
shutdown hook to be removed in this event.

More complete logs would help us.

Increasing your heap space may fix the underlying problem. It may be that there 
is another underlying issue causing your heap to explode. To establish this we 
would need a heap dump during one of these events. If, however, you make 
extensive use of CQL row deletions, or CQL collections and perform overwrites 
of the entire collection, it may be that you are encountering CASSANDRA-9486, 
in which case a patch is available for that, and will be fixed in 2.1.6 to be 
released shortly.

 Memory leak 
 

 Key: CASSANDRA-9549
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9549
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes, 
 2 cores 7.5G memory, 800G platter for cassandra data, root partition and 
 commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1 
 replica/zone
 JVM: /usr/java/jdk1.8.0_40/jre/bin/java 
 JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
 -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
 -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M 
 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
 -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
 -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler 
 -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled 
 -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=1 -XX:+UseCondCardMark 
 -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
 -Dcom.sun.management.jmxremote.rmi.port=7199 
 -Dcom.sun.management.jmxremote.ssl=false 
 -Dcom.sun.management.jmxremote.authenticate=false 
 -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra 
 -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid 
 Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
Reporter: Ivar Thorson
Priority: Critical
 Fix For: 2.1.x

 Attachments: c4_system.log, cassandra.yaml, cpu-load.png, 
 memoryuse.png, ref-java-errors.jpeg, suspect.png, two-loads.png


 We have been experiencing a severe memory leak with Cassandra 2.1.5 that, 
 over the period of a couple of days, eventually consumes all of the available 
 JVM heap space, putting the JVM into GC hell where it keeps trying CMS 
 collection but can't free up any heap space. This pattern happens for every 
 node in our cluster and is requiring rolling cassandra restarts just to keep 
 the cluster running. We have upgraded the cluster per Datastax docs from the 
 2.0 branch a couple of months ago and have been using the data from this 
 cluster for more than a year without problem.
 As the heap fills up with non-GC-able objects, the CPU/OS load average grows 
 along with it. Heap dumps reveal an increasing number of 
 java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps 
 over a 2 day period, and watched the number of Node objects go from 4M, to 
 19M, to 36M, and eventually about 65M objects before the node stops 
 responding. The screen capture of our heap dump is from the 19M measurement.
 Load on the cluster is minimal. We can see this effect even with only a 
 handful of writes per second. (See attachments for Opscenter snapshots during 
 very light loads and heavier loads). Even with only 5 reads a sec we see this 
 behavior.
 Log files show repeated errors in Ref.java:181 and Ref.java:279 and LEAK 
 detected messages:
 {code}
 ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error 
 when closing class 
 

[jira] [Commented] (CASSANDRA-9549) Memory leak

2015-06-04 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573456#comment-14573456
 ] 

Benedict commented on CASSANDRA-9549:
-

It's possible there is a script in their envrionment running periodically, 
asking the servers to drain. There are really very few ways for that executor 
service to be shutdown (assuming it's the executor submitted to inside of the 
method throwing the REE; it's hard to say with absolute certainty because the 
stack trace has been compressed due to the frequency of the error generation): 
the shutdown hook indicating the VM is terminating, or the drain() command.

As I said, though: more info, means we can say with greater certainty. That 
full log history since restart would be a great start. A thread dump would be 
the natural follow on if that was not sufficiently helpful.

 Memory leak 
 

 Key: CASSANDRA-9549
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9549
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes, 
 2 cores 7.5G memory, 800G platter for cassandra data, root partition and 
 commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1 
 replica/zone
 JVM: /usr/java/jdk1.8.0_40/jre/bin/java 
 JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
 -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
 -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M 
 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
 -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
 -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler 
 -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled 
 -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=1 -XX:+UseCondCardMark 
 -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
 -Dcom.sun.management.jmxremote.rmi.port=7199 
 -Dcom.sun.management.jmxremote.ssl=false 
 -Dcom.sun.management.jmxremote.authenticate=false 
 -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra 
 -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid 
 Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
Reporter: Ivar Thorson
Priority: Critical
 Fix For: 2.1.x

 Attachments: cassandra.yaml, cpu-load.png, memoryuse.png, 
 suspect.png, two-loads.png


 We have been experiencing a severe memory leak with Cassandra 2.1.5 that, 
 over the period of a couple of days, eventually consumes all of the available 
 JVM heap space, putting the JVM into GC hell where it keeps trying CMS 
 collection but can't free up any heap space. This pattern happens for every 
 node in our cluster and is requiring rolling cassandra restarts just to keep 
 the cluster running. We have upgraded the cluster per Datastax docs from the 
 2.0 branch a couple of months ago and have been using the data from this 
 cluster for more than a year without problem.
 As the heap fills up with non-GC-able objects, the CPU/OS load average grows 
 along with it. Heap dumps reveal an increasing number of 
 java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps 
 over a 2 day period, and watched the number of Node objects go from 4M, to 
 19M, to 36M, and eventually about 65M objects before the node stops 
 responding. The screen capture of our heap dump is from the 19M measurement.
 Load on the cluster is minimal. We can see this effect even with only a 
 handful of writes per second. (See attachments for Opscenter snapshots during 
 very light loads and heavier loads). Even with only 5 reads a sec we see this 
 behavior.
 Log files show repeated errors in Ref.java:181 and Ref.java:279 and LEAK 
 detected messages:
 {code}
 ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error 
 when closing class 
 org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1302301946:/data1/data/ourtablegoeshere-ka-1150
 java.util.concurrent.RejectedExecutionException: Task 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@32680b31 
 rejected from 
 org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@573464d6[Terminated,
  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1644]
 {code}
 {code}
 ERROR [Reference-Reaper:1] 2015-06-01 18:27:37,083 Ref.java:181 - LEAK 
 DETECTED: a reference 
 (org.apache.cassandra.utils.concurrent.Ref$State@74b5df92) to class 
 org.apache.cassandra.io.sstable.SSTableReader$DescriptorTypeTidy@2054303604:/data2/data/ourtablegoeshere-ka-1151
  was not 

[jira] [Commented] (CASSANDRA-9549) Memory leak

2015-06-04 Thread Ivar Thorson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573492#comment-14573492
 ] 

Ivar Thorson commented on CASSANDRA-9549:
-

Our sysadmin has been doing drain just before restarting, but it is not 
periodic. The only periodic crontab command is a weekly repair of each node, 
done in a rolling fashion. We looked for correlation with this memory leak 
problem and found none. Is there something else that would cause this 
drain-like behavior?

 Memory leak 
 

 Key: CASSANDRA-9549
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9549
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes, 
 2 cores 7.5G memory, 800G platter for cassandra data, root partition and 
 commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1 
 replica/zone
 JVM: /usr/java/jdk1.8.0_40/jre/bin/java 
 JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
 -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
 -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M 
 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
 -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
 -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler 
 -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled 
 -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=1 -XX:+UseCondCardMark 
 -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
 -Dcom.sun.management.jmxremote.rmi.port=7199 
 -Dcom.sun.management.jmxremote.ssl=false 
 -Dcom.sun.management.jmxremote.authenticate=false 
 -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra 
 -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid 
 Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
Reporter: Ivar Thorson
Priority: Critical
 Fix For: 2.1.x

 Attachments: cassandra.yaml, cpu-load.png, memoryuse.png, 
 suspect.png, two-loads.png


 We have been experiencing a severe memory leak with Cassandra 2.1.5 that, 
 over the period of a couple of days, eventually consumes all of the available 
 JVM heap space, putting the JVM into GC hell where it keeps trying CMS 
 collection but can't free up any heap space. This pattern happens for every 
 node in our cluster and is requiring rolling cassandra restarts just to keep 
 the cluster running. We have upgraded the cluster per Datastax docs from the 
 2.0 branch a couple of months ago and have been using the data from this 
 cluster for more than a year without problem.
 As the heap fills up with non-GC-able objects, the CPU/OS load average grows 
 along with it. Heap dumps reveal an increasing number of 
 java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps 
 over a 2 day period, and watched the number of Node objects go from 4M, to 
 19M, to 36M, and eventually about 65M objects before the node stops 
 responding. The screen capture of our heap dump is from the 19M measurement.
 Load on the cluster is minimal. We can see this effect even with only a 
 handful of writes per second. (See attachments for Opscenter snapshots during 
 very light loads and heavier loads). Even with only 5 reads a sec we see this 
 behavior.
 Log files show repeated errors in Ref.java:181 and Ref.java:279 and LEAK 
 detected messages:
 {code}
 ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error 
 when closing class 
 org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1302301946:/data1/data/ourtablegoeshere-ka-1150
 java.util.concurrent.RejectedExecutionException: Task 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@32680b31 
 rejected from 
 org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@573464d6[Terminated,
  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1644]
 {code}
 {code}
 ERROR [Reference-Reaper:1] 2015-06-01 18:27:37,083 Ref.java:181 - LEAK 
 DETECTED: a reference 
 (org.apache.cassandra.utils.concurrent.Ref$State@74b5df92) to class 
 org.apache.cassandra.io.sstable.SSTableReader$DescriptorTypeTidy@2054303604:/data2/data/ourtablegoeshere-ka-1151
  was not released before the reference was garbage collected
 {code}
 This might be related to [CASSANDRA-8723]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9549) Memory leak

2015-06-04 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573400#comment-14573400
 ] 

Benedict commented on CASSANDRA-9549:
-

Looks like you've called drain(), but the server is still up and trying to do 
work...

A full system log (back until node startup) could help, but this situation 
should be pretty atypical. Restarting the node should be enough to correct it.

 Memory leak 
 

 Key: CASSANDRA-9549
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9549
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes, 
 2 cores 7.5G memory, 800G platter for cassandra data, root partition and 
 commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1 
 replica/zone
 JVM: /usr/java/jdk1.8.0_40/jre/bin/java 
 JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
 -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
 -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M 
 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
 -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
 -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler 
 -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled 
 -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=1 -XX:+UseCondCardMark 
 -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
 -Dcom.sun.management.jmxremote.rmi.port=7199 
 -Dcom.sun.management.jmxremote.ssl=false 
 -Dcom.sun.management.jmxremote.authenticate=false 
 -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra 
 -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid 
 Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
Reporter: Ivar Thorson
Priority: Critical
 Fix For: 2.1.x

 Attachments: cassandra.yaml, cpu-load.png, memoryuse.png, 
 suspect.png, two-loads.png


 We have been experiencing a severe memory leak with Cassandra 2.1.5 that, 
 over the period of a couple of days, eventually consumes all of the available 
 JVM heap space, putting the JVM into GC hell where it keeps trying CMS 
 collection but can't free up any heap space. This pattern happens for every 
 node in our cluster and is requiring rolling cassandra restarts just to keep 
 the cluster running. We have upgraded the cluster per Datastax docs from the 
 2.0 branch a couple of months ago and have been using the data from this 
 cluster for more than a year without problem.
 As the heap fills up with non-GC-able objects, the CPU/OS load average grows 
 along with it. Heap dumps reveal an increasing number of 
 java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps 
 over a 2 day period, and watched the number of Node objects go from 4M, to 
 19M, to 36M, and eventually about 65M objects before the node stops 
 responding. The screen capture of our heap dump is from the 19M measurement.
 Load on the cluster is minimal. We can see this effect even with only a 
 handful of writes per second. (See attachments for Opscenter snapshots during 
 very light loads and heavier loads). Even with only 5 reads a sec we see this 
 behavior.
 Log files show repeated errors in Ref.java:181 and Ref.java:279 and LEAK 
 detected messages:
 ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error 
 when closing class 
 org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1302301946:/data1/data/ourtablegoeshere-ka-1150
 java.util.concurrent.RejectedExecutionException: Task 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@32680b31 
 rejected from 
 org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@573464d6[Terminated,
  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1644]
 ERROR [Reference-Reaper:1] 2015-06-01 18:27:37,083 Ref.java:181 - LEAK 
 DETECTED: a reference 
 (org.apache.cassandra.utils.concurrent.Ref$State@74b5df92) to class 
 org.apache.cassandra.io.sstable.SSTableReader$DescriptorTypeTidy@2054303604:/data2/data/ourtablegoeshere-ka-1151
  was not released before the reference was garbage collected
 This might be related to [CASSANDRA-8723]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9549) Memory leak

2015-06-04 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573404#comment-14573404
 ] 

Philip Thompson commented on CASSANDRA-9549:


Original description says it's happening for every node in the cluster, and 
that they've all been restarted.

 Memory leak 
 

 Key: CASSANDRA-9549
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9549
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes, 
 2 cores 7.5G memory, 800G platter for cassandra data, root partition and 
 commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1 
 replica/zone
 JVM: /usr/java/jdk1.8.0_40/jre/bin/java 
 JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
 -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
 -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M 
 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
 -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
 -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler 
 -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled 
 -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=1 -XX:+UseCondCardMark 
 -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
 -Dcom.sun.management.jmxremote.rmi.port=7199 
 -Dcom.sun.management.jmxremote.ssl=false 
 -Dcom.sun.management.jmxremote.authenticate=false 
 -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra 
 -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid 
 Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
Reporter: Ivar Thorson
Priority: Critical
 Fix For: 2.1.x

 Attachments: cassandra.yaml, cpu-load.png, memoryuse.png, 
 suspect.png, two-loads.png


 We have been experiencing a severe memory leak with Cassandra 2.1.5 that, 
 over the period of a couple of days, eventually consumes all of the available 
 JVM heap space, putting the JVM into GC hell where it keeps trying CMS 
 collection but can't free up any heap space. This pattern happens for every 
 node in our cluster and is requiring rolling cassandra restarts just to keep 
 the cluster running. We have upgraded the cluster per Datastax docs from the 
 2.0 branch a couple of months ago and have been using the data from this 
 cluster for more than a year without problem.
 As the heap fills up with non-GC-able objects, the CPU/OS load average grows 
 along with it. Heap dumps reveal an increasing number of 
 java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps 
 over a 2 day period, and watched the number of Node objects go from 4M, to 
 19M, to 36M, and eventually about 65M objects before the node stops 
 responding. The screen capture of our heap dump is from the 19M measurement.
 Load on the cluster is minimal. We can see this effect even with only a 
 handful of writes per second. (See attachments for Opscenter snapshots during 
 very light loads and heavier loads). Even with only 5 reads a sec we see this 
 behavior.
 Log files show repeated errors in Ref.java:181 and Ref.java:279 and LEAK 
 detected messages:
 {code}
 ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error 
 when closing class 
 org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1302301946:/data1/data/ourtablegoeshere-ka-1150
 java.util.concurrent.RejectedExecutionException: Task 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@32680b31 
 rejected from 
 org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@573464d6[Terminated,
  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1644]
 {code}
 {code}
 ERROR [Reference-Reaper:1] 2015-06-01 18:27:37,083 Ref.java:181 - LEAK 
 DETECTED: a reference 
 (org.apache.cassandra.utils.concurrent.Ref$State@74b5df92) to class 
 org.apache.cassandra.io.sstable.SSTableReader$DescriptorTypeTidy@2054303604:/data2/data/ourtablegoeshere-ka-1151
  was not released before the reference was garbage collected
 {code}
 This might be related to [CASSANDRA-8723]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9549) Memory leak

2015-06-04 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573506#comment-14573506
 ] 

Benedict commented on CASSANDRA-9549:
-

Without the log file there is very little more I can tell you. The only two 
places the ES is explicitly shutdown are:

# a drain; and
# the VM executing its shutdown hooks

The only two places a drain occurs are:

# via NodeTool drain
# receipt of a gossip remove node message (which should, by my understanding, 
only be triggered by a NodeTool remove command)

It's possible something else is awry, but we have very little information to 
work with.

Is it possible you are running an embedded cassandra, so that the Cassandra 
instance restarts without the JVM restarting? Or is it possible you are 
draining more nodes than you intend during the restart process?

 Memory leak 
 

 Key: CASSANDRA-9549
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9549
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes, 
 2 cores 7.5G memory, 800G platter for cassandra data, root partition and 
 commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1 
 replica/zone
 JVM: /usr/java/jdk1.8.0_40/jre/bin/java 
 JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
 -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
 -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M 
 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
 -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
 -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler 
 -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled 
 -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=1 -XX:+UseCondCardMark 
 -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
 -Dcom.sun.management.jmxremote.rmi.port=7199 
 -Dcom.sun.management.jmxremote.ssl=false 
 -Dcom.sun.management.jmxremote.authenticate=false 
 -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra 
 -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid 
 Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
Reporter: Ivar Thorson
Priority: Critical
 Fix For: 2.1.x

 Attachments: cassandra.yaml, cpu-load.png, memoryuse.png, 
 suspect.png, two-loads.png


 We have been experiencing a severe memory leak with Cassandra 2.1.5 that, 
 over the period of a couple of days, eventually consumes all of the available 
 JVM heap space, putting the JVM into GC hell where it keeps trying CMS 
 collection but can't free up any heap space. This pattern happens for every 
 node in our cluster and is requiring rolling cassandra restarts just to keep 
 the cluster running. We have upgraded the cluster per Datastax docs from the 
 2.0 branch a couple of months ago and have been using the data from this 
 cluster for more than a year without problem.
 As the heap fills up with non-GC-able objects, the CPU/OS load average grows 
 along with it. Heap dumps reveal an increasing number of 
 java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps 
 over a 2 day period, and watched the number of Node objects go from 4M, to 
 19M, to 36M, and eventually about 65M objects before the node stops 
 responding. The screen capture of our heap dump is from the 19M measurement.
 Load on the cluster is minimal. We can see this effect even with only a 
 handful of writes per second. (See attachments for Opscenter snapshots during 
 very light loads and heavier loads). Even with only 5 reads a sec we see this 
 behavior.
 Log files show repeated errors in Ref.java:181 and Ref.java:279 and LEAK 
 detected messages:
 {code}
 ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error 
 when closing class 
 org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1302301946:/data1/data/ourtablegoeshere-ka-1150
 java.util.concurrent.RejectedExecutionException: Task 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@32680b31 
 rejected from 
 org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@573464d6[Terminated,
  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1644]
 {code}
 {code}
 ERROR [Reference-Reaper:1] 2015-06-01 18:27:37,083 Ref.java:181 - LEAK 
 DETECTED: a reference 
 (org.apache.cassandra.utils.concurrent.Ref$State@74b5df92) to class 
 org.apache.cassandra.io.sstable.SSTableReader$DescriptorTypeTidy@2054303604:/data2/data/ourtablegoeshere-ka-1151
  was not released before the 

[jira] [Commented] (CASSANDRA-9549) Memory leak

2015-06-04 Thread Ivar Thorson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573513#comment-14573513
 ] 

Ivar Thorson commented on CASSANDRA-9549:
-

I'll look at getting the log and thread dump.

Is this related to changes for [CASSANDRA-8707]?

 Memory leak 
 

 Key: CASSANDRA-9549
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9549
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes, 
 2 cores 7.5G memory, 800G platter for cassandra data, root partition and 
 commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1 
 replica/zone
 JVM: /usr/java/jdk1.8.0_40/jre/bin/java 
 JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
 -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
 -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M 
 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
 -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
 -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler 
 -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled 
 -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=1 -XX:+UseCondCardMark 
 -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
 -Dcom.sun.management.jmxremote.rmi.port=7199 
 -Dcom.sun.management.jmxremote.ssl=false 
 -Dcom.sun.management.jmxremote.authenticate=false 
 -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra 
 -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid 
 Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
Reporter: Ivar Thorson
Priority: Critical
 Fix For: 2.1.x

 Attachments: cassandra.yaml, cpu-load.png, memoryuse.png, 
 suspect.png, two-loads.png


 We have been experiencing a severe memory leak with Cassandra 2.1.5 that, 
 over the period of a couple of days, eventually consumes all of the available 
 JVM heap space, putting the JVM into GC hell where it keeps trying CMS 
 collection but can't free up any heap space. This pattern happens for every 
 node in our cluster and is requiring rolling cassandra restarts just to keep 
 the cluster running. We have upgraded the cluster per Datastax docs from the 
 2.0 branch a couple of months ago and have been using the data from this 
 cluster for more than a year without problem.
 As the heap fills up with non-GC-able objects, the CPU/OS load average grows 
 along with it. Heap dumps reveal an increasing number of 
 java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps 
 over a 2 day period, and watched the number of Node objects go from 4M, to 
 19M, to 36M, and eventually about 65M objects before the node stops 
 responding. The screen capture of our heap dump is from the 19M measurement.
 Load on the cluster is minimal. We can see this effect even with only a 
 handful of writes per second. (See attachments for Opscenter snapshots during 
 very light loads and heavier loads). Even with only 5 reads a sec we see this 
 behavior.
 Log files show repeated errors in Ref.java:181 and Ref.java:279 and LEAK 
 detected messages:
 {code}
 ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error 
 when closing class 
 org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1302301946:/data1/data/ourtablegoeshere-ka-1150
 java.util.concurrent.RejectedExecutionException: Task 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@32680b31 
 rejected from 
 org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@573464d6[Terminated,
  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1644]
 {code}
 {code}
 ERROR [Reference-Reaper:1] 2015-06-01 18:27:37,083 Ref.java:181 - LEAK 
 DETECTED: a reference 
 (org.apache.cassandra.utils.concurrent.Ref$State@74b5df92) to class 
 org.apache.cassandra.io.sstable.SSTableReader$DescriptorTypeTidy@2054303604:/data2/data/ourtablegoeshere-ka-1151
  was not released before the reference was garbage collected
 {code}
 This might be related to [CASSANDRA-8723]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)