[ https://issues.apache.org/jira/browse/CASSANDRA-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13732098#comment-13732098 ]
Aaron Daubman commented on CASSANDRA-5605: ------------------------------------------ Apologies if this isn't directly relevant, but I seem to be experiencing the same issue using 1.2.8 launched via ccm for integration testing. One differentiating feature here is that this happened the first time the node was ever brought up (0 data). The integration tests attempting to use the ccm cluster never completed due to this hang, but all they do is test creation of a fairly simple schema and then attempt to write and then read back a single row. Here's what was in the log: INFO [main] 2013-08-07 14:56:46,763 CassandraDaemon.java (line 118) Logging initialized INFO [main] 2013-08-07 14:56:46,807 CassandraDaemon.java (line 145) JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.7.0_25 INFO [main] 2013-08-07 14:56:46,808 CassandraDaemon.java (line 183) Heap size: 8248098816/8248098816 INFO [main] 2013-08-07 14:56:46,808 CassandraDaemon.java (line 184) Classpath: /opt/ccmlib_cassandra/ccm/lwcdbng_test_cluster/node1/conf:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/build/classes/main:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/build/classes/thrift:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/antlr-3.2.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/avro-1.4.0-fixes.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/avro-1.4.0-sources-fixes.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/commons-cli-1.1.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/commons-codec-1.2.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/commons-lang-2.6.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/compress-lzf-0.8.4.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/concurrentlinkedhashmap-lru-1.3.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/guava-13.0.1.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/high-scale-lib-1.1.2.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/jackson-core-asl-1.9.2.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/jackson-mapper-asl-1.9.2.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/jamm-0.2.5.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/jbcrypt-0.3m.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/jline-1.0.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/json-simple-1.1.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/libthrift-0.7.0.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/log4j-1.2.16.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/lz4-1.1.0.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/metrics-core-2.0.3.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/netty-3.5.9.Final.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/servlet-api-2.5-20081211.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/slf4j-api-1.7.2.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/slf4j-log4j12-1.7.2.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/snakeyaml-1.6.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/snappy-java-1.0.5.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/snaptree-0.1.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/jamm-0.2.5.jar INFO [main] 2013-08-07 14:56:46,822 CLibrary.java (line 65) JNA not found. Native methods will be disabled. INFO [main] 2013-08-07 14:56:46,891 DatabaseDescriptor.java (line 132) Loading settings from file:/opt/ccmlib_cassandra/ccm/lwcdbng_test_cluster/node1/conf/cassandra.yaml INFO [main] 2013-08-07 14:56:47,821 DatabaseDescriptor.java (line 150) Data files directories: [/opt/ccmlib_cassandra/ccm/lwcdbng_test_cluster/node1/data] INFO [main] 2013-08-07 14:56:47,822 DatabaseDescriptor.java (line 151) Commit log directory: /opt/ccmlib_cassandra/ccm/lwcdbng_test_cluster/node1/commitlogs INFO [main] 2013-08-07 14:56:47,822 DatabaseDescriptor.java (line 191) DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap INFO [main] 2013-08-07 14:56:47,822 DatabaseDescriptor.java (line 205) disk_failure_policy is stop INFO [main] 2013-08-07 14:56:48,000 DatabaseDescriptor.java (line 273) Global memtable threshold is enabled at 2622MB INFO [main] 2013-08-07 14:56:49,142 DatabaseDescriptor.java (line 401) Not using multi-threaded compaction INFO [main] 2013-08-07 14:56:52,610 CacheService.java (line 111) Initializing key cache with capacity of 100 MBs. INFO [main] 2013-08-07 14:56:52,675 CacheService.java (line 140) Scheduling key cache save to each 14400 seconds (going to save all keys). INFO [main] 2013-08-07 14:56:52,678 CacheService.java (line 154) Initializing row cache with capacity of 0 MBs and provider org.apache.cassandra.cache.SerializingCacheProvider INFO [main] 2013-08-07 14:56:52,698 CacheService.java (line 166) Scheduling row cache save to each 0 seconds (going to save all keys). INFO [main] 2013-08-07 14:56:57,163 DatabaseDescriptor.java (line 535) Couldn't detect any schema definitions in local storage. INFO [main] 2013-08-07 14:56:57,164 DatabaseDescriptor.java (line 540) To create keyspaces and column families, see 'help create' in cqlsh. INFO [main] 2013-08-07 14:56:57,258 CommitLog.java (line 120) No commitlog files found; skipping replay INFO [main] 2013-08-07 14:56:57,715 StorageService.java (line 456) Cassandra version: 1.2.8-SNAPSHOT INFO [main] 2013-08-07 14:56:57,715 StorageService.java (line 457) Thrift API version: 19.36.0 INFO [main] 2013-08-07 14:56:57,716 StorageService.java (line 458) CQL supported versions: 2.0.0,3.0.5 (default: 3.0.5) INFO [main] 2013-08-07 14:56:57,795 StorageService.java (line 483) Loading persisted ring state INFO [main] 2013-08-07 14:56:57,805 StorageService.java (line 564) Starting up server gossip WARN [main] 2013-08-07 14:56:57,823 SystemTable.java (line 573) No host ID found, created 6abf69c6-8472-4607-96b7-4a532ed8e2a8 (Note: This should happen exactly once per node). INFO [main] 2013-08-07 14:56:57,852 ColumnFamilyStore.java (line 630) Enqueuing flush of Memtable-local@2038679449(367/367 serialized/live bytes, 15 ops) ERROR [FlushWriter:1] 2013-08-07 14:56:57,860 CassandraDaemon.java (line 192) Exception in thread Thread[FlushWriter:1,5,main] java.lang.RuntimeException: Insufficient disk space to write 452 bytes at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:42) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) INFO [ScheduledTasks:1] 2013-08-07 15:14:08,243 GCInspector.java (line 119) GC for ParNew: 2786 ms for 1 collections, 230275096 used; max is 8248098816 INFO [ScheduledTasks:1] 2013-08-07 15:14:08,250 StatusLogger.java (line 53) Pool Name Active Pending Blocked INFO [ScheduledTasks:1] 2013-08-07 15:14:08,291 StatusLogger.java (line 68) MemtablePostFlusher 1 1 0 INFO [ScheduledTasks:1] 2013-08-07 15:14:08,292 StatusLogger.java (line 68) FlushWriter 0 0 0 INFO [ScheduledTasks:1] 2013-08-07 15:14:08,293 StatusLogger.java (line 68) commitlog_archiver 0 0 0 INFO [ScheduledTasks:1] 2013-08-07 15:14:08,294 StatusLogger.java (line 73) CompactionManager 0 0 INFO [ScheduledTasks:1] 2013-08-07 15:14:08,522 StatusLogger.java (line 85) MessagingService n/a 0,0 INFO [ScheduledTasks:1] 2013-08-07 15:14:08,523 StatusLogger.java (line 95) Cache Type Size Capacity KeysToSave Provider INFO [ScheduledTasks:1] 2013-08-07 15:14:08,524 StatusLogger.java (line 96) KeyCache 0 104857600 all INFO [ScheduledTasks:1] 2013-08-07 15:14:08,525 StatusLogger.java (line 102) RowCache 0 0 all org.apache.cassandra.cache.SerializingCacheProvider INFO [ScheduledTasks:1] 2013-08-07 15:14:08,525 StatusLogger.java (line 109) ColumnFamily Memtable ops,data INFO [ScheduledTasks:1] 2013-08-07 15:14:08,526 StatusLogger.java (line 112) system.local 0,0 INFO [ScheduledTasks:1] 2013-08-07 15:14:08,526 StatusLogger.java (line 112) system.peers 0,0 INFO [ScheduledTasks:1] 2013-08-07 15:14:08,527 StatusLogger.java (line 112) system.batchlog 0,0 INFO [ScheduledTasks:1] 2013-08-07 15:14:08,527 StatusLogger.java (line 112) system.NodeIdInfo 0,0 INFO [ScheduledTasks:1] 2013-08-07 15:14:08,528 StatusLogger.java (line 112) system.LocationInfo 0,0 INFO [ScheduledTasks:1] 2013-08-07 15:14:08,528 StatusLogger.java (line 112) system.Schema 0,0 INFO [ScheduledTasks:1] 2013-08-07 15:14:08,529 StatusLogger.java (line 112) system.Migrations 0,0 INFO [ScheduledTasks:1] 2013-08-07 15:14:08,529 StatusLogger.java (line 112) system.schema_keyspaces 8,251 INFO [ScheduledTasks:1] 2013-08-07 15:14:08,530 StatusLogger.java (line 112) system.schema_columns 398,24717 INFO [ScheduledTasks:1] 2013-08-07 15:14:08,530 StatusLogger.java (line 112) system.schema_columnfamilies 369,22187 INFO [ScheduledTasks:1] 2013-08-07 15:14:08,530 StatusLogger.java (line 112) system.IndexInfo 0,0 INFO [ScheduledTasks:1] 2013-08-07 15:14:08,531 StatusLogger.java (line 112) system.range_xfers 0,0 INFO [ScheduledTasks:1] 2013-08-07 15:14:08,531 StatusLogger.java (line 112) system.peer_events 0,0 INFO [ScheduledTasks:1] 2013-08-07 15:14:08,532 StatusLogger.java (line 112) system.hints 0,0 INFO [ScheduledTasks:1] 2013-08-07 15:14:08,532 StatusLogger.java (line 112) system.HintsColumnFamily 0,0 INFO [ScheduledTasks:1] 2013-08-07 15:14:08,532 StatusLogger.java (line 112) system_traces.sessions 0,0 INFO [ScheduledTasks:1] 2013-08-07 15:14:08,533 StatusLogger.java (line 112) system_traces.events 0,0 > Crash caused by insufficient disk space to flush > ------------------------------------------------ > > Key: CASSANDRA-5605 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5605 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 1.2.5 > Environment: java version "1.7.0_15" > Reporter: Dan Hendry > Priority: Minor > Fix For: 2.0.1 > > > A few times now I have seen our Cassandra nodes crash by running themselves > out of memory. It starts with the following exception: > {noformat} > ERROR [FlushWriter:13000] 2013-05-31 11:32:02,350 CassandraDaemon.java (line > 164) Exception in thread Thread[FlushWriter:13000,5,main] > java.lang.RuntimeException: Insufficient disk space to write 8042730 bytes > at > org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:42) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:722) > {noformat} > After which, it seems the MemtablePostFlusher stage gets stuck and no further > memtables get flushed: > {noformat} > INFO [ScheduledTasks:1] 2013-05-31 11:59:12,467 StatusLogger.java (line 68) > MemtablePostFlusher 1 32 0 > INFO [ScheduledTasks:1] 2013-05-31 11:59:12,469 StatusLogger.java (line 73) > CompactionManager 1 2 > {noformat} > What makes this ridiculous is that, at the time, the data directory on this > node had 981GB free disk space (as reported by du). We primarily use STCS and > at the time the aforementioned exception occurred, at least one compaction > task was executing which could have easily involved 981GB (or more) worth of > input SSTables. Correct me if I am wrong but but Cassandra counts data > currently being compacted against available disk space. In our case, this is > a significant overestimation of the space required by compaction since a > large portion of the data being compacted has expired or is an overwrite. > More to the point though, Cassandra should not crash because its out of disk > space unless its really actually out of disk space (ie, dont consider > 'phantom' compaction disk usage when flushing). I have seen one of our nodes > die in this way before our alerts for disk space even went off. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira