[ https://issues.apache.org/jira/browse/CASSANDRA-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13023298#comment-13023298 ]
Thibaut commented on CASSANDRA-2543: ------------------------------------ I stopped all java programs and restarted the cluster and increased the heap size to 5.5 gig. All my table have a memtable limit of 32MB and I only have about 20 tables with 2 column families in each table. Executing a manual compactation or just waiting will return the following error: ERROR [ReadStage:24] 2011-04-22 19:30:58,330 AbstractCassandraDaemon.java (line 112) Fatal exception in thread Thread[ReadStage:24,5,main] java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.io.util.BufferedRandomAccessFile.<init>(BufferedRandomAccessFile.java:123) at org.apache.cassandra.io.util.BufferedRandomAccessFile.<init>(BufferedRandomAccessFile.java:108) at org.apache.cassandra.io.util.BufferedRandomAccessFile.<init>(BufferedRandomAccessFile.java:93) at org.apache.cassandra.io.sstable.SSTableScanner.<init>(SSTableScanner.java:74) at org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:548) at org.apache.cassandra.db.RowIteratorFactory.getIterator(RowIteratorFactory.java:95) at org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1425) at org.apache.cassandra.service.RangeSliceVerbHandler.doVerb(RangeSliceVerbHandler.java:49) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) One table has 778 intermediate tables and won't compact. (Each table has about the size of the memtable flush limit). Only the first one are biggers (e.g 5x 19 GB) -rw-r--r-- 1 root root 18M 2011-04-22 17:07 table_userentries-f-42142-Data.db -rw-r--r-- 1 root root 17M 2011-04-22 17:10 table_userentries-f-42143-Data.db -rw-r--r-- 1 root root 17M 2011-04-22 17:12 table_userentries-f-42144-Data.db -rw-r--r-- 1 root root 17M 2011-04-22 17:15 table_userentries-f-42145-Data.db -rw-r--r-- 1 root root 17M 2011-04-22 17:17 table_userentries-f-42146-Data.db -rw-r--r-- 1 root root 17M 2011-04-22 17:20 table_userentries-f-42147-Data.db -rw-r--r-- 1 root root 17M 2011-04-22 17:23 table_userentries-f-42148-Data.db -rw-r--r-- 1 root root 17M 2011-04-22 17:26 table_userentries-f-42149-Data.db -rw-r--r-- 1 root root 13M 2011-04-22 17:33 table_userentries-f-42150-Data.db -rw-r--r-- 1 root root 17M 2011-04-22 17:37 table_userentries-f-42152-Data.db -rw-r--r-- 1 root root 17M 2011-04-22 17:38 table_userentries-f-42153-Data.db -rw-r--r-- 1 root root 17M 2011-04-22 17:39 table_userentries-f-42154-Data.db -rw-r--r-- 1 root root 17M 2011-04-22 17:40 table_userentries-f-42155-Data.db -rw-r--r-- 1 root root 17M 2011-04-22 17:42 table_userentries-f-42156-Data.db -rw-r--r-- 1 root root 17M 2011-04-22 17:43 table_userentries-f-42157-Data.db -rw-r--r-- 1 root root 17M 2011-04-22 17:44 table_userentries-f-42158-Data.db -rw-r--r-- 1 root root 17M 2011-04-22 17:45 table_userentries-f-42159-Data.db -rw-r--r-- 1 root root 6.7M 2011-04-22 19:18 table_userentries-f-42160-Data.db -rw-r--r-- 1 root root 528M 2011-04-22 19:19 table_userentries-f-42161-Data.db It's certainly something related to compacting. There are log file entries related to cassandra compacting other tables (Compacting table_....). But this table never shows up (even not when I trigger a manual compaction on that table). > Node not responding, bringing down cluster, marked as up > -------------------------------------------------------- > > Key: CASSANDRA-2543 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2543 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 0.7.4 > Reporter: Thibaut > Fix For: 0.7.6 > > Attachments: jstack > > > I have one node which constantly hangs and brings done the entire cluster > (not giving any answers). > If I restart the node, the node will hang after a certain number of time. I > have no indication > It's marked as up when executing the nodetool ring command. > Executing the ring command on the node itself (without any traffic on the > cluster) takes at least 2 minutes to execute. The node takes about 50%-100% > of cpu over all cpus. > Netstats doesn't list anything interesting: > /software/cassandra/bin/nodetool -h localhost netstats > Mode: Normal > Not sending any streams. > Not receiving any streams. > Pool Name Active Pending Completed > Commands n/a 0 51064 > Responses n/a 0 530479 > I attached the jstack of the node. There are no indications that the node has > faulty hardware. > /usr/bin/java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 > -Xms5254M -Xmx5254M -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss128k > -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled > -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 > -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly > -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=8080 > -Dcom.sun.management.jmxremote.ssl=false > -Dcom.sun.management.jmxremote.authenticate=false > -Dlog4j.configuration=log4j-server.properties > -Dlog4j.defaultInitOverride=true -Dcassandra-foreground=yes -cp > /software/cassandra/bin/../conf:/software/cassandra/bin/../build/classes:/software/cassandra/bin/../lib/antlr-3.1.3.jar:/software/cassandra/bin/../lib/apache-cassandra-0.7.4.jar:/software/cassandra/bin/../lib/avro-1.4.0-fixes.jar:/software/cassandra/bin/../lib/avro-1.4.0-sources-fixes.jar:/software/cassandra/bin/../lib/commons-cli-1.1.jar:/software/cassandra/bin/../lib/commons-codec-1.2.jar:/software/cassandra/bin/../lib/commons-collections-3.2.1.jar:/software/cassandra/bin/../lib/commons-lang-2.4.jar:/software/cassandra/bin/../lib/concurrentlinkedhashmap-lru-1.1.jar:/software/cassandra/bin/../lib/guava-r05.jar:/software/cassandra/bin/../lib/high-scale-lib.jar:/software/cassandra/bin/../lib/jackson-core-asl-1.4.0.jar:/software/cassandra/bin/../lib/jackson-mapper-asl-1.4.0.jar:/software/cassandra/bin/../lib/jetty-6.1.21.jar:/software/cassandra/bin/../lib/jetty-util-6.1.21.jar:/software/cassandra/bin/../lib/jline-0.9.94.jar:/software/cassandra/bin/../lib/json-simple-1.1.jar:/software/cassandra/bin/../lib/jug-2.0.0.jar:/software/cassandra/bin/../lib/libthrift-0.5.jar:/software/cassandra/bin/../lib/log4j-1.2.16.jar:/software/cassandra/bin/../lib/servlet-api-2.5-20081211.jar:/software/cassandra/bin/../lib/slf4j-api-1.6.1.jar:/software/cassandra/bin/../lib/slf4j-log4j12-1.6.1.jar:/software/cassandra/bin/../lib/snakeyaml-1.6.jar > org.apache.cassandra.thrift.CassandraDaemon -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira