[jira] [Commented] (CASSANDRA-5605) Crash caused by insufficient disk space to flush

2013-09-18 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13770869#comment-13770869
 ] 

Jonathan Ellis commented on CASSANDRA-5605:
---

bq. we still need to add error handler to FlushRunnable so postExecutor does 
not get blocked

I'm not sure we want to unblock it -- if the flush errors out, then we 
definitely don't want commitlog segments getting cleaned up.  What did you have 
in mind?

 Crash caused by insufficient disk space to flush
 

 Key: CASSANDRA-5605
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5605
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.5
 Environment: java version 1.7.0_15
Reporter: Dan Hendry
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 1.2.10, 2.0.1


 A few times now I have seen our Cassandra nodes crash by running themselves 
 out of memory. It starts with the following exception:
 {noformat}
 ERROR [FlushWriter:13000] 2013-05-31 11:32:02,350 CassandraDaemon.java (line 
 164) Exception in thread Thread[FlushWriter:13000,5,main]
 java.lang.RuntimeException: Insufficient disk space to write 8042730 bytes
 at 
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:42)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat} 
 After which, it seems the MemtablePostFlusher stage gets stuck and no further 
 memtables get flushed: 
 {noformat} 
 INFO [ScheduledTasks:1] 2013-05-31 11:59:12,467 StatusLogger.java (line 68) 
 MemtablePostFlusher   132 0
 INFO [ScheduledTasks:1] 2013-05-31 11:59:12,469 StatusLogger.java (line 73) 
 CompactionManager 1 2
 {noformat} 
 What makes this ridiculous is that, at the time, the data directory on this 
 node had 981GB free disk space (as reported by du). We primarily use STCS and 
 at the time the aforementioned exception occurred, at least one compaction 
 task was executing which could have easily involved 981GB (or more) worth of 
 input SSTables. Correct me if I am wrong but but Cassandra counts data 
 currently being compacted against available disk space. In our case, this is 
 a significant overestimation of the space required by compaction since a 
 large portion of the data being compacted has expired or is an overwrite.
 More to the point though, Cassandra should not crash because its out of disk 
 space unless its really actually out of disk space (ie, dont consider 
 'phantom' compaction disk usage when flushing). I have seen one of our nodes 
 die in this way before our alerts for disk space even went off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5605) Crash caused by insufficient disk space to flush

2013-09-18 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13770849#comment-13770849
 ] 

Yuki Morishita commented on CASSANDRA-5605:
---

+1 to the patch.

Though (maybe in separate ticket?) we still need to add error handler to 
FlushRunnable so postExecutor does not get blocked.

 Crash caused by insufficient disk space to flush
 

 Key: CASSANDRA-5605
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5605
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.5
 Environment: java version 1.7.0_15
Reporter: Dan Hendry
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 1.2.10, 2.0.1


 A few times now I have seen our Cassandra nodes crash by running themselves 
 out of memory. It starts with the following exception:
 {noformat}
 ERROR [FlushWriter:13000] 2013-05-31 11:32:02,350 CassandraDaemon.java (line 
 164) Exception in thread Thread[FlushWriter:13000,5,main]
 java.lang.RuntimeException: Insufficient disk space to write 8042730 bytes
 at 
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:42)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat} 
 After which, it seems the MemtablePostFlusher stage gets stuck and no further 
 memtables get flushed: 
 {noformat} 
 INFO [ScheduledTasks:1] 2013-05-31 11:59:12,467 StatusLogger.java (line 68) 
 MemtablePostFlusher   132 0
 INFO [ScheduledTasks:1] 2013-05-31 11:59:12,469 StatusLogger.java (line 73) 
 CompactionManager 1 2
 {noformat} 
 What makes this ridiculous is that, at the time, the data directory on this 
 node had 981GB free disk space (as reported by du). We primarily use STCS and 
 at the time the aforementioned exception occurred, at least one compaction 
 task was executing which could have easily involved 981GB (or more) worth of 
 input SSTables. Correct me if I am wrong but but Cassandra counts data 
 currently being compacted against available disk space. In our case, this is 
 a significant overestimation of the space required by compaction since a 
 large portion of the data being compacted has expired or is an overwrite.
 More to the point though, Cassandra should not crash because its out of disk 
 space unless its really actually out of disk space (ie, dont consider 
 'phantom' compaction disk usage when flushing). I have seen one of our nodes 
 die in this way before our alerts for disk space even went off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5605) Crash caused by insufficient disk space to flush

2013-09-18 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13770886#comment-13770886
 ] 

Yuki Morishita commented on CASSANDRA-5605:
---

Ah, right.
Just wondered if we can do something about preventing filling up postExecutor 
queue.

 Crash caused by insufficient disk space to flush
 

 Key: CASSANDRA-5605
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5605
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.5
 Environment: java version 1.7.0_15
Reporter: Dan Hendry
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 1.2.10, 2.0.1


 A few times now I have seen our Cassandra nodes crash by running themselves 
 out of memory. It starts with the following exception:
 {noformat}
 ERROR [FlushWriter:13000] 2013-05-31 11:32:02,350 CassandraDaemon.java (line 
 164) Exception in thread Thread[FlushWriter:13000,5,main]
 java.lang.RuntimeException: Insufficient disk space to write 8042730 bytes
 at 
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:42)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat} 
 After which, it seems the MemtablePostFlusher stage gets stuck and no further 
 memtables get flushed: 
 {noformat} 
 INFO [ScheduledTasks:1] 2013-05-31 11:59:12,467 StatusLogger.java (line 68) 
 MemtablePostFlusher   132 0
 INFO [ScheduledTasks:1] 2013-05-31 11:59:12,469 StatusLogger.java (line 73) 
 CompactionManager 1 2
 {noformat} 
 What makes this ridiculous is that, at the time, the data directory on this 
 node had 981GB free disk space (as reported by du). We primarily use STCS and 
 at the time the aforementioned exception occurred, at least one compaction 
 task was executing which could have easily involved 981GB (or more) worth of 
 input SSTables. Correct me if I am wrong but but Cassandra counts data 
 currently being compacted against available disk space. In our case, this is 
 a significant overestimation of the space required by compaction since a 
 large portion of the data being compacted has expired or is an overwrite.
 More to the point though, Cassandra should not crash because its out of disk 
 space unless its really actually out of disk space (ie, dont consider 
 'phantom' compaction disk usage when flushing). I have seen one of our nodes 
 die in this way before our alerts for disk space even went off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5605) Crash caused by insufficient disk space to flush

2013-09-18 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771002#comment-13771002
 ] 

Jonathan Ellis commented on CASSANDRA-5605:
---

Committed.  If you come up with a good idea there, let's open a new ticket.

 Crash caused by insufficient disk space to flush
 

 Key: CASSANDRA-5605
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5605
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.5
 Environment: java version 1.7.0_15
Reporter: Dan Hendry
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 1.2.10, 2.0.1


 A few times now I have seen our Cassandra nodes crash by running themselves 
 out of memory. It starts with the following exception:
 {noformat}
 ERROR [FlushWriter:13000] 2013-05-31 11:32:02,350 CassandraDaemon.java (line 
 164) Exception in thread Thread[FlushWriter:13000,5,main]
 java.lang.RuntimeException: Insufficient disk space to write 8042730 bytes
 at 
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:42)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat} 
 After which, it seems the MemtablePostFlusher stage gets stuck and no further 
 memtables get flushed: 
 {noformat} 
 INFO [ScheduledTasks:1] 2013-05-31 11:59:12,467 StatusLogger.java (line 68) 
 MemtablePostFlusher   132 0
 INFO [ScheduledTasks:1] 2013-05-31 11:59:12,469 StatusLogger.java (line 73) 
 CompactionManager 1 2
 {noformat} 
 What makes this ridiculous is that, at the time, the data directory on this 
 node had 981GB free disk space (as reported by du). We primarily use STCS and 
 at the time the aforementioned exception occurred, at least one compaction 
 task was executing which could have easily involved 981GB (or more) worth of 
 input SSTables. Correct me if I am wrong but but Cassandra counts data 
 currently being compacted against available disk space. In our case, this is 
 a significant overestimation of the space required by compaction since a 
 large portion of the data being compacted has expired or is an overwrite.
 More to the point though, Cassandra should not crash because its out of disk 
 space unless its really actually out of disk space (ie, dont consider 
 'phantom' compaction disk usage when flushing). I have seen one of our nodes 
 die in this way before our alerts for disk space even went off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5605) Crash caused by insufficient disk space to flush

2013-09-16 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768372#comment-13768372
 ] 

Jeremiah Jordan commented on CASSANDRA-5605:


After CASSANDRA-4292 flushing checks the space reserved by compactions.  So 
check if you have a bunch of pending compactions with a huge size.

 Crash caused by insufficient disk space to flush
 

 Key: CASSANDRA-5605
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5605
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.5
 Environment: java version 1.7.0_15
Reporter: Dan Hendry
Priority: Minor
 Fix For: 2.0.1


 A few times now I have seen our Cassandra nodes crash by running themselves 
 out of memory. It starts with the following exception:
 {noformat}
 ERROR [FlushWriter:13000] 2013-05-31 11:32:02,350 CassandraDaemon.java (line 
 164) Exception in thread Thread[FlushWriter:13000,5,main]
 java.lang.RuntimeException: Insufficient disk space to write 8042730 bytes
 at 
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:42)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat} 
 After which, it seems the MemtablePostFlusher stage gets stuck and no further 
 memtables get flushed: 
 {noformat} 
 INFO [ScheduledTasks:1] 2013-05-31 11:59:12,467 StatusLogger.java (line 68) 
 MemtablePostFlusher   132 0
 INFO [ScheduledTasks:1] 2013-05-31 11:59:12,469 StatusLogger.java (line 73) 
 CompactionManager 1 2
 {noformat} 
 What makes this ridiculous is that, at the time, the data directory on this 
 node had 981GB free disk space (as reported by du). We primarily use STCS and 
 at the time the aforementioned exception occurred, at least one compaction 
 task was executing which could have easily involved 981GB (or more) worth of 
 input SSTables. Correct me if I am wrong but but Cassandra counts data 
 currently being compacted against available disk space. In our case, this is 
 a significant overestimation of the space required by compaction since a 
 large portion of the data being compacted has expired or is an overwrite.
 More to the point though, Cassandra should not crash because its out of disk 
 space unless its really actually out of disk space (ie, dont consider 
 'phantom' compaction disk usage when flushing). I have seen one of our nodes 
 die in this way before our alerts for disk space even went off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5605) Crash caused by insufficient disk space to flush

2013-09-16 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768469#comment-13768469
 ] 

Jeremiah Jordan commented on CASSANDRA-5605:


Have seen a lot of instances of people hitting this reported recently.  I think 
we probably shouldn't fail stuff if C* thinks it might not have enough free 
space, because stuff is reserved.  Probably a good idea to use that reserving 
information as a hint to which drive to pick for the JBOD case, but if no drive 
will fit the data when looking at reserved, pick the one with the most free 
space (like we used to).

 Crash caused by insufficient disk space to flush
 

 Key: CASSANDRA-5605
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5605
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.5
 Environment: java version 1.7.0_15
Reporter: Dan Hendry
Priority: Minor
 Fix For: 2.0.1


 A few times now I have seen our Cassandra nodes crash by running themselves 
 out of memory. It starts with the following exception:
 {noformat}
 ERROR [FlushWriter:13000] 2013-05-31 11:32:02,350 CassandraDaemon.java (line 
 164) Exception in thread Thread[FlushWriter:13000,5,main]
 java.lang.RuntimeException: Insufficient disk space to write 8042730 bytes
 at 
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:42)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat} 
 After which, it seems the MemtablePostFlusher stage gets stuck and no further 
 memtables get flushed: 
 {noformat} 
 INFO [ScheduledTasks:1] 2013-05-31 11:59:12,467 StatusLogger.java (line 68) 
 MemtablePostFlusher   132 0
 INFO [ScheduledTasks:1] 2013-05-31 11:59:12,469 StatusLogger.java (line 73) 
 CompactionManager 1 2
 {noformat} 
 What makes this ridiculous is that, at the time, the data directory on this 
 node had 981GB free disk space (as reported by du). We primarily use STCS and 
 at the time the aforementioned exception occurred, at least one compaction 
 task was executing which could have easily involved 981GB (or more) worth of 
 input SSTables. Correct me if I am wrong but but Cassandra counts data 
 currently being compacted against available disk space. In our case, this is 
 a significant overestimation of the space required by compaction since a 
 large portion of the data being compacted has expired or is an overwrite.
 More to the point though, Cassandra should not crash because its out of disk 
 space unless its really actually out of disk space (ie, dont consider 
 'phantom' compaction disk usage when flushing). I have seen one of our nodes 
 die in this way before our alerts for disk space even went off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5605) Crash caused by insufficient disk space to flush

2013-08-07 Thread Aaron Daubman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732098#comment-13732098
 ] 

Aaron Daubman commented on CASSANDRA-5605:
--

Apologies if this isn't directly relevant, but I seem to be experiencing the 
same issue using 1.2.8 launched via ccm for integration testing. One 
differentiating feature here is that this happened the first time the node was 
ever brought up (0 data). The integration tests attempting to use the ccm 
cluster never completed due to this hang, but all they do is test creation of a 
fairly simple schema and then attempt to write and then read back a single row.

Here's what was in the log:
 INFO [main] 2013-08-07 14:56:46,763 CassandraDaemon.java (line 118) Logging 
initialized
 INFO [main] 2013-08-07 14:56:46,807 CassandraDaemon.java (line 145) JVM 
vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.7.0_25
 INFO [main] 2013-08-07 14:56:46,808 CassandraDaemon.java (line 183) Heap size: 
8248098816/8248098816
 INFO [main] 2013-08-07 14:56:46,808 CassandraDaemon.java (line 184) Classpath: 
/opt/ccmlib_cassandra/ccm/lwcdbng_test_cluster/node1/conf:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/build/classes/main:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/build/classes/thrift:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/antlr-3.2.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/avro-1.4.0-fixes.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/avro-1.4.0-sources-fixes.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/commons-cli-1.1.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/commons-codec-1.2.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/commons-lang-2.6.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/compress-lzf-0.8.4.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/concurrentlinkedhashmap-lru-1.3.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/guava-13.0.1.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/high-scale-lib-1.1.2.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/jackson-core-asl-1.9.2.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/jackson-mapper-asl-1.9.2.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/jamm-0.2.5.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/jbcrypt-0.3m.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/jline-1.0.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/json-simple-1.1.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/libthrift-0.7.0.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/log4j-1.2.16.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/lz4-1.1.0.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/metrics-core-2.0.3.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/netty-3.5.9.Final.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/servlet-api-2.5-20081211.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/slf4j-api-1.7.2.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/slf4j-log4j12-1.7.2.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/snakeyaml-1.6.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/snappy-java-1.0.5.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/snaptree-0.1.jar:/opt/ccmlib_cassandra/apache-cassandra-1.2.8-src/lib/jamm-0.2.5.jar
 INFO [main] 2013-08-07 14:56:46,822 CLibrary.java (line 65) JNA not found. 
Native methods will be disabled.
 INFO [main] 2013-08-07 14:56:46,891 DatabaseDescriptor.java (line 132) Loading 
settings from 
file:/opt/ccmlib_cassandra/ccm/lwcdbng_test_cluster/node1/conf/cassandra.yaml
 INFO [main] 2013-08-07 14:56:47,821 DatabaseDescriptor.java (line 150) Data 
files directories: [/opt/ccmlib_cassandra/ccm/lwcdbng_test_cluster/node1/data]
 INFO [main] 2013-08-07 14:56:47,822 DatabaseDescriptor.java (line 151) Commit 
log directory: /opt/ccmlib_cassandra/ccm/lwcdbng_test_cluster/node1/commitlogs
 INFO [main] 2013-08-07 14:56:47,822 DatabaseDescriptor.java (line 191) 
DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
 INFO [main] 2013-08-07 14:56:47,822 DatabaseDescriptor.java (line 205) 
disk_failure_policy is stop
 INFO [main] 2013-08-07 14:56:48,000 DatabaseDescriptor.java (line 273) Global 
memtable threshold is enabled at 2622MB
 INFO [main] 2013-08-07 14:56:49,142 DatabaseDescriptor.java (line 401) Not 
using multi-threaded compaction
 INFO [main] 2013-08-07 14:56:52,610 CacheService.java (line 111) Initializing 
key cache with capacity of 100 MBs.
 INFO [main] 2013-08-07 14:56:52,675 CacheService.java (line 140) Scheduling 
key cache save to each 14400 seconds (going to save all keys).
 INFO [main] 2013-08-07 14:56:52,678 CacheService.java (line 154) Initializing 
row cache with capacity of 0 MBs and provider 
org.apache.cassandra.cache.SerializingCacheProvider
 INFO [main] 2013-08-07 14:56:52,698 CacheService.java 

[jira] [Commented] (CASSANDRA-5605) Crash caused by insufficient disk space to flush

2013-07-10 Thread Ananth Gundabattula (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705442#comment-13705442
 ] 

Ananth Gundabattula commented on CASSANDRA-5605:


Am not sure if the following information helps but we too hit this issue in 
production today. We were running with cassandra 1.2.4 and two patches 
CASSANDRA-5554  CASSANDRA-5418. 

We were running with RF=3 and LCS. 

We cross checked using JMX if blacklisting is the cause of this bug and it 
looks like it is definitely not the case. 

We however saw a pile up of pending compactions ~ 1800 pending compactions per 
node when node crashed. Surprising thing is that the Insufficient disk space 
to write  bytes appears much before the node crashes. For us it started 
appearing aprrox 3 hours before the node crashed. 

The cluster which showed this behavior was having loads of writes occurring ( 
We were using multiple SSTableLoaders to stream data into this cluster. ). We 
pushed in almost 15 TB worth data ( including the RF =3 ) in a matter of 16 
hours. We were not serving any reads from this cluster as we were still 
migrating data to it. 

Another interesting behavior observed that nodes were neighbors in most of the 
time. 

Am not sure if the above information helps but wanted to add it to the context 
of the ticket.  

 Crash caused by insufficient disk space to flush
 

 Key: CASSANDRA-5605
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5605
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.3, 1.2.5
 Environment: java version 1.7.0_15
Reporter: Dan Hendry

 A few times now I have seen our Cassandra nodes crash by running themselves 
 out of memory. It starts with the following exception:
 {noformat}
 ERROR [FlushWriter:13000] 2013-05-31 11:32:02,350 CassandraDaemon.java (line 
 164) Exception in thread Thread[FlushWriter:13000,5,main]
 java.lang.RuntimeException: Insufficient disk space to write 8042730 bytes
 at 
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:42)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat} 
 After which, it seems the MemtablePostFlusher stage gets stuck and no further 
 memtables get flushed: 
 {noformat} 
 INFO [ScheduledTasks:1] 2013-05-31 11:59:12,467 StatusLogger.java (line 68) 
 MemtablePostFlusher   132 0
 INFO [ScheduledTasks:1] 2013-05-31 11:59:12,469 StatusLogger.java (line 73) 
 CompactionManager 1 2
 {noformat} 
 What makes this ridiculous is that, at the time, the data directory on this 
 node had 981GB free disk space (as reported by du). We primarily use STCS and 
 at the time the aforementioned exception occurred, at least one compaction 
 task was executing which could have easily involved 981GB (or more) worth of 
 input SSTables. Correct me if I am wrong but but Cassandra counts data 
 currently being compacted against available disk space. In our case, this is 
 a significant overestimation of the space required by compaction since a 
 large portion of the data being compacted has expired or is an overwrite.
 More to the point though, Cassandra should not crash because its out of disk 
 space unless its really actually out of disk space (ie, dont consider 
 'phantom' compaction disk usage when flushing). I have seen one of our nodes 
 die in this way before our alerts for disk space even went off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5605) Crash caused by insufficient disk space to flush

2013-05-31 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671757#comment-13671757
 ] 

Brandon Williams commented on CASSANDRA-5605:
-

Do you have multiple data directories?  If you only have one, it may have been 
blacklisted and marked read-only by a previous issue, can you check the logs 
for anything like that?

 Crash caused by insufficient disk space to flush
 

 Key: CASSANDRA-5605
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5605
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.3, 1.2.5
 Environment: java version 1.7.0_15
Reporter: Dan Hendry

 A few times now I have seen our Cassandra nodes crash by running themselves 
 out of memory. It starts with the following exception:
 {noformat}
 ERROR [FlushWriter:13000] 2013-05-31 11:32:02,350 CassandraDaemon.java (line 
 164) Exception in thread Thread[FlushWriter:13000,5,main]
 java.lang.RuntimeException: Insufficient disk space to write 8042730 bytes
 at 
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:42)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat} 
 After which, it seems the MemtablePostFlusher stage gets stuck and no further 
 memtables get flushed: 
 {noformat} 
 INFO [ScheduledTasks:1] 2013-05-31 11:59:12,467 StatusLogger.java (line 68) 
 MemtablePostFlusher   132 0
 INFO [ScheduledTasks:1] 2013-05-31 11:59:12,469 StatusLogger.java (line 73) 
 CompactionManager 1 2
 {noformat} 
 What makes this ridiculous is that, at the time, the data directory on this 
 node had 981GB free disk space (as reported by du). We primarily use STCS and 
 at the time the aforementioned exception occurred, at least one compaction 
 task was executing which could have easily involved 981GB (or more) worth of 
 input SSTables. Correct me if I am wrong but but Cassandra counts data 
 currently being compacted against available disk space. In our case, this is 
 a significant overestimation of the space required by compaction since a 
 large portion of the data being compacted has expired or is an overwrite.
 More to the point though, Cassandra should not crash because its out of disk 
 space unless its really actually out of disk space (ie, dont consider 
 'phantom' compaction disk usage when flushing). I have seen one of our nodes 
 die in this way before our alerts for disk space even went off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5605) Crash caused by insufficient disk space to flush

2013-05-31 Thread Dan Hendry (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671780#comment-13671780
 ] 

Dan Hendry commented on CASSANDRA-5605:
---

We have only one data directory. There is nothing in the log about it being 
blacklisted.

 Crash caused by insufficient disk space to flush
 

 Key: CASSANDRA-5605
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5605
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.3, 1.2.5
 Environment: java version 1.7.0_15
Reporter: Dan Hendry

 A few times now I have seen our Cassandra nodes crash by running themselves 
 out of memory. It starts with the following exception:
 {noformat}
 ERROR [FlushWriter:13000] 2013-05-31 11:32:02,350 CassandraDaemon.java (line 
 164) Exception in thread Thread[FlushWriter:13000,5,main]
 java.lang.RuntimeException: Insufficient disk space to write 8042730 bytes
 at 
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:42)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat} 
 After which, it seems the MemtablePostFlusher stage gets stuck and no further 
 memtables get flushed: 
 {noformat} 
 INFO [ScheduledTasks:1] 2013-05-31 11:59:12,467 StatusLogger.java (line 68) 
 MemtablePostFlusher   132 0
 INFO [ScheduledTasks:1] 2013-05-31 11:59:12,469 StatusLogger.java (line 73) 
 CompactionManager 1 2
 {noformat} 
 What makes this ridiculous is that, at the time, the data directory on this 
 node had 981GB free disk space (as reported by du). We primarily use STCS and 
 at the time the aforementioned exception occurred, at least one compaction 
 task was executing which could have easily involved 981GB (or more) worth of 
 input SSTables. Correct me if I am wrong but but Cassandra counts data 
 currently being compacted against available disk space. In our case, this is 
 a significant overestimation of the space required by compaction since a 
 large portion of the data being compacted has expired or is an overwrite.
 More to the point though, Cassandra should not crash because its out of disk 
 space unless its really actually out of disk space (ie, dont consider 
 'phantom' compaction disk usage when flushing). I have seen one of our nodes 
 die in this way before our alerts for disk space even went off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira