[ https://issues.apache.org/jira/browse/CASSANDRA-8723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14311864#comment-14311864 ]
Brent Haines edited comment on CASSANDRA-8723 at 2/9/15 6:55 AM: ----------------------------------------------------------------- [~jeffl] I ran {code} watch -n 10 'nodetool compactionstats' {code} on the effected node and watched it for awhile. For us it would always end up on the same compaction, of the same CF where it would get stuck until the OOM happened. The stats on the compaction give you a hint -- the total number of bytes are the same each time, then it will get some portion of the way through the compaction when progress freezes and eventually the system runs OOM. We have the standard replication factor of 3 so it was no big deal to stop cassandra, delete the node's storage of that CF and then restart and run repair. Care must be taken, obviously, but it did recover steady state for us on 3 separate incidents. Once it's fixed on a node, we haven't had issues return for that node. was (Author: thebrenthaines): [~jeffl] I ran {code} watch -n 10 'nodetool compactionstats' {code} on the effected node and watch it for awhile. For us it would always end up on the same compaction, of the same CF where it would get stuck until the OOM happened. The stats on the compaction give you a hint -- the total number of bytes are the same each time, then it will get some portion of the way through the compaction when progress freezes and eventually the system runs OOM. We have the standard replication factor of 3 so it was no big deal to stop cassandra, delete the node's storage of that CF and then restart and run repair. Care must be taken, obviously, but it did recover steady state for us on 3 separate incidents. Once it's fixed no a node, we haven't had issues return for that node. > Cassandra 2.1.2 Memory issue - java process memory usage continuously > increases until process is killed by OOM killer > --------------------------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-8723 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8723 > Project: Cassandra > Issue Type: Bug > Reporter: Jeff Liu > Fix For: 2.1.3 > > Attachments: cassandra.yaml > > > Issue: > We have an on-going issue with cassandra nodes running with continuously > increasing memory until killed by OOM. > {noformat} > Jan 29 10:15:41 cass-chisel19 kernel: [24533109.783481] Out of memory: Kill > process 13919 (java) score 911 or sacrifice child > Jan 29 10:15:41 cass-chisel19 kernel: [24533109.783557] Killed process 13919 > (java) total-vm:18366340kB, anon-rss:6461472kB, file-rss:6684kB > {noformat} > System Profile: > cassandra version 2.1.2 > system: aws c1.xlarge instance with 8 cores, 7.1G memory. > cassandra jvm: > -Xms1792M -Xmx1792M -Xmn400M -Xss256k > {noformat} > java -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.8.jar > -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1792M -Xmx1792M > -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003 > -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled > -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 > -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly > -XX:+UseTLAB -XX:+CMSClassUnloadingEnabled -XX:+UseCondCardMark > -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC > -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime > -XX:+PrintPromotionFailure -Xloggc:/var/log/cassandra/gc-1421511249.log > -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=48M > -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 > -Dcom.sun.management.jmxremote.ssl=false > -Dcom.sun.management.jmxremote.authenticate=false > -javaagent:/usr/share/java/graphite-reporter-agent-1.0-SNAPSHOT.jar=graphiteServer=metrics-a.hq.nest.com;graphitePort=2003;graphitePollInt=60 > -Dlogback.configurationFile=logback.xml > -Dcassandra.logdir=/var/log/cassandra -Dcassandra.storagedir= > -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid -cp > /etc/cassandra:/usr/share/cassandra/lib/airline-0.6.jar:/usr/share/cassandra/lib/antlr-runtime-3.5.2.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/commons-math3-3.2.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.4.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-16.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.0.6.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.8.jar:/usr/share/cassandra/lib/javax.inject.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna-4.0.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/logback-classic-1.1.2.jar:/usr/share/cassandra/lib/logback-core-1.1.2.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/metrics-graphite-2.2.0.jar:/usr/share/cassandra/lib/mx4j-tools.jar:/usr/share/cassandra/lib/netty-all-4.0.23.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.2.jar:/usr/share/cassandra/lib/stream-2.5.2.jar:/usr/share/cassandra/lib/stringtemplate-4.0.2.jar:/usr/share/cassandra/lib/super-csv-2.1.0.jar:/usr/share/cassandra/lib/thrift-server-0.3.7.jar:/usr/share/cassandra/apache-cassandra-2.1.2.jar:/usr/share/cassandra/apache-cassandra-thrift-2.1.2.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/cassandra-driver-core-2.0.5.jar:/usr/share/cassandra/netty-3.9.0.Final.jar:/usr/share/cassandra/stress.jar: > -XX:HeapDumpPath=/var/lib/cassandra/java_1421511248.hprof > -XX:ErrorFile=/var/lib/cassandra/hs_err_1421511248.log > org.apache.cassandra.service.CassandraDaemon > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)