[jira] [Commented] (CASSANDRA-6364) There should be different disk_failure_policies for data and commit volumes or commit volume failure should always cause node exit

2014-02-11 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897791#comment-13897791
 ] 

Marcus Eriksson commented on CASSANDRA-6364:


+1 on the 2.0-patch, committed

 There should be different disk_failure_policies for data and commit volumes 
 or commit volume failure should always cause node exit
 --

 Key: CASSANDRA-6364
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6364
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: JBOD, single dedicated commit disk
Reporter: J. Ryan Earl
Assignee: Benedict
 Fix For: 2.0.6

 Attachments: tmp-2.0.patch


 We're doing fault testing on a pre-production Cassandra cluster.  One of the 
 tests was to simulation failure of the commit volume/disk, which in our case 
 is on a dedicated disk.  We expected failure of the commit volume to be 
 handled somehow, but what we found was that no action was taken by Cassandra 
 when the commit volume fail.  We simulated this simply by pulling the 
 physical disk that backed the commit volume, which resulted in filesystem I/O 
 errors on the mount point.
 What then happened was that the Cassandra Heap filled up to the point that it 
 was spending 90% of its time doing garbage collection.  No errors were logged 
 in regards to the failed commit volume.  Gossip on other nodes in the cluster 
 eventually flagged the node as down.  Gossip on the local node showed itself 
 as up, and all other nodes as down.
 The most serious problem was that connections to the coordinator on this node 
 became very slow due to the on-going GC, as I assume uncommitted writes piled 
 up on the JVM heap.  What we believe should have happened is that Cassandra 
 should have caught the I/O error and exited with a useful log message, or 
 otherwise done some sort of useful cleanup.  Otherwise the node goes into a 
 sort of Zombie state, spending most of its time in GC, and thus slowing down 
 any transactions that happen to use the coordinator on said node.
 A limit on in-memory, unflushed writes before refusing requests may also 
 work.  Point being, something should be done to handle the commit volume 
 dying as doing nothing results in affecting the entire cluster.  I should 
 note, we are using: disk_failure_policy: best_effort



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6364) There should be different disk_failure_policies for data and commit volumes or commit volume failure should always cause node exit

2014-02-03 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889330#comment-13889330
 ] 

Marcus Eriksson commented on CASSANDRA-6364:


About the ignore case, lets hard code something for now - rate limit at one log 
error message per second perhaps?

I don't think we should default to 'ignore' in Config.java - if someone does a 
minor upgrade they most likely wont check NEWS or update their config files to 
add the new parameter.

The shipped config in cassandra.yaml looks wrong, should be 
commit_failure_policy, not disk_failure_policy I guess

 There should be different disk_failure_policies for data and commit volumes 
 or commit volume failure should always cause node exit
 --

 Key: CASSANDRA-6364
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6364
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: JBOD, single dedicated commit disk
Reporter: J. Ryan Earl
Assignee: Benedict
 Fix For: 2.0.5


 We're doing fault testing on a pre-production Cassandra cluster.  One of the 
 tests was to simulation failure of the commit volume/disk, which in our case 
 is on a dedicated disk.  We expected failure of the commit volume to be 
 handled somehow, but what we found was that no action was taken by Cassandra 
 when the commit volume fail.  We simulated this simply by pulling the 
 physical disk that backed the commit volume, which resulted in filesystem I/O 
 errors on the mount point.
 What then happened was that the Cassandra Heap filled up to the point that it 
 was spending 90% of its time doing garbage collection.  No errors were logged 
 in regards to the failed commit volume.  Gossip on other nodes in the cluster 
 eventually flagged the node as down.  Gossip on the local node showed itself 
 as up, and all other nodes as down.
 The most serious problem was that connections to the coordinator on this node 
 became very slow due to the on-going GC, as I assume uncommitted writes piled 
 up on the JVM heap.  What we believe should have happened is that Cassandra 
 should have caught the I/O error and exited with a useful log message, or 
 otherwise done some sort of useful cleanup.  Otherwise the node goes into a 
 sort of Zombie state, spending most of its time in GC, and thus slowing down 
 any transactions that happen to use the coordinator on said node.
 A limit on in-memory, unflushed writes before refusing requests may also 
 work.  Point being, something should be done to handle the commit volume 
 dying as doing nothing results in affecting the entire cluster.  I should 
 note, we are using: disk_failure_policy: best_effort



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6364) There should be different disk_failure_policies for data and commit volumes or commit volume failure should always cause node exit

2014-02-03 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889343#comment-13889343
 ] 

Benedict commented on CASSANDRA-6364:
-

bq. I don't think we should default to 'ignore' in Config.java

Well, I wasn't too sure about this. On the one hand switching the default to 
stop means we could over cautiously kill user's hosts unexpectedly, maybe 
resulting in interruption of service (especially, say, our users running on 
SAN, as much as it is strongly discouraged). Whereas switching to ignore 
means we may not be durable. Neither are great defaults, but both are better 
than before. I'm comfortable with both, so if you feel strongly it should be 
stop, I'll happily switch it. Perhaps I lean slightly in favour of it too, 
but it depends on if the user favours durability over availability, really, so 
there doesn't seem a single correct answer to me. Note that the default 
disk_failure_policy is also ignore, and the prior behaviour was closest to 
ignore, so introducing a default that results in a failing node is somewhat 
unprecedented for disk failure.

bq. The shipped config in cassandra.yaml looks wrong, should be 
commit_failure_policy, not disk_failure_policy I guess

Right, looks like I didn't update the first or last lines I copy-pasted. 
Thanks. 

bq. About the ignore case, lets hard code something for now - rate limit at one 
log error message per second perhaps?

If we're just rate limiting the log messages, I'd say one per minute might be 
better. But I'm not sure having the threads spin trying to make progress is 
useful. The PCLES, for instance, will just start burning one core until it can 
successfully sync, assuming it doesn't actually have to wait each time to 
encounter the error. Tempted to have a 1s pause after an error during which we 
just sleep the erroring thread.

Another issue that slightly concerns me is what happens if the CLES sync() 
starts failing, but the append and CLA doesn't. With ignore this could 
potentially result in us mapping in and allocating huge amounts of disk space, 
but not being able to sync or clear it. This might either result in lots of 
swapping, and/or us exceeding by a large margin our max log space goal. Since 
we never guarantee to keep to this I'm not sure how much of a problem it would 
be, but an error down to ACLs that stops us syncing one file might potentally 
end up eating up huge quantities of commit disk space. I'm tempted to have the 
CLA thread block once it hits twice its goal max space (or maybe introduce a 
second config parameter for a hard maximum). But I'm also tempted to leave 
these changes for the 2.1 branch, since it's a fairly specific failure case, 
and what we have is a big improvement over the current state of affairs.



 There should be different disk_failure_policies for data and commit volumes 
 or commit volume failure should always cause node exit
 --

 Key: CASSANDRA-6364
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6364
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: JBOD, single dedicated commit disk
Reporter: J. Ryan Earl
Assignee: Benedict
 Fix For: 2.0.5


 We're doing fault testing on a pre-production Cassandra cluster.  One of the 
 tests was to simulation failure of the commit volume/disk, which in our case 
 is on a dedicated disk.  We expected failure of the commit volume to be 
 handled somehow, but what we found was that no action was taken by Cassandra 
 when the commit volume fail.  We simulated this simply by pulling the 
 physical disk that backed the commit volume, which resulted in filesystem I/O 
 errors on the mount point.
 What then happened was that the Cassandra Heap filled up to the point that it 
 was spending 90% of its time doing garbage collection.  No errors were logged 
 in regards to the failed commit volume.  Gossip on other nodes in the cluster 
 eventually flagged the node as down.  Gossip on the local node showed itself 
 as up, and all other nodes as down.
 The most serious problem was that connections to the coordinator on this node 
 became very slow due to the on-going GC, as I assume uncommitted writes piled 
 up on the JVM heap.  What we believe should have happened is that Cassandra 
 should have caught the I/O error and exited with a useful log message, or 
 otherwise done some sort of useful cleanup.  Otherwise the node goes into a 
 sort of Zombie state, spending most of its time in GC, and thus slowing down 
 any transactions that happen to use the coordinator on said node.
 A limit on in-memory, unflushed writes before refusing requests may also 
 work.  Point being, something should 

[jira] [Commented] (CASSANDRA-6364) There should be different disk_failure_policies for data and commit volumes or commit volume failure should always cause node exit

2013-12-04 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838786#comment-13838786
 ] 

Benedict commented on CASSANDRA-6364:
-

How far do we want to go with this?

Adding a simple exit on error is very straightforward, but in my experience you 
can have hang-style failures, so we should definitely have a separate thread 
checking the liveness of the CLSegmentManager and CLService. Probably a 
user-configurable not-alive time in the yaml should be used to mark the CL as 
non-responsive if either hasn't heartbeated in that time. Probably we don't 
want to immediately die on an error too, but simply not heartbeat and die if 
the error doesn't recover in some interval, so that anyone monitoring the error 
logs has time to correct the issue (let's say it's just out of space) before it 
dies.

The bigger question is, do we want to do anything clever if we don't want to 
die? Should we start draining the mutation stage and just dropping the 
messages? If so, should we attempt to recover if the drive starts responding 
again after draining the mutation stage?



 There should be different disk_failure_policies for data and commit volumes 
 or commit volume failure should always cause node exit
 --

 Key: CASSANDRA-6364
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6364
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: JBOD, single dedicated commit disk
Reporter: J. Ryan Earl
Assignee: Benedict
 Fix For: 2.0.4


 We're doing fault testing on a pre-production Cassandra cluster.  One of the 
 tests was to simulation failure of the commit volume/disk, which in our case 
 is on a dedicated disk.  We expected failure of the commit volume to be 
 handled somehow, but what we found was that no action was taken by Cassandra 
 when the commit volume fail.  We simulated this simply by pulling the 
 physical disk that backed the commit volume, which resulted in filesystem I/O 
 errors on the mount point.
 What then happened was that the Cassandra Heap filled up to the point that it 
 was spending 90% of its time doing garbage collection.  No errors were logged 
 in regards to the failed commit volume.  Gossip on other nodes in the cluster 
 eventually flagged the node as down.  Gossip on the local node showed itself 
 as up, and all other nodes as down.
 The most serious problem was that connections to the coordinator on this node 
 became very slow due to the on-going GC, as I assume uncommitted writes piled 
 up on the JVM heap.  What we believe should have happened is that Cassandra 
 should have caught the I/O error and exited with a useful log message, or 
 otherwise done some sort of useful cleanup.  Otherwise the node goes into a 
 sort of Zombie state, spending most of its time in GC, and thus slowing down 
 any transactions that happen to use the coordinator on said node.
 A limit on in-memory, unflushed writes before refusing requests may also 
 work.  Point being, something should be done to handle the commit volume 
 dying as doing nothing results in affecting the entire cluster.  I should 
 note, we are using: disk_failure_policy: best_effort



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-6364) There should be different disk_failure_policies for data and commit volumes or commit volume failure should always cause node exit

2013-12-04 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839383#comment-13839383
 ] 

Jonathan Ellis commented on CASSANDRA-6364:
---

Just make it die on IOError like the existing code.  For now people can deal 
with hangs instead of erroring manually.

 There should be different disk_failure_policies for data and commit volumes 
 or commit volume failure should always cause node exit
 --

 Key: CASSANDRA-6364
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6364
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: JBOD, single dedicated commit disk
Reporter: J. Ryan Earl
Assignee: Benedict
 Fix For: 2.0.4


 We're doing fault testing on a pre-production Cassandra cluster.  One of the 
 tests was to simulation failure of the commit volume/disk, which in our case 
 is on a dedicated disk.  We expected failure of the commit volume to be 
 handled somehow, but what we found was that no action was taken by Cassandra 
 when the commit volume fail.  We simulated this simply by pulling the 
 physical disk that backed the commit volume, which resulted in filesystem I/O 
 errors on the mount point.
 What then happened was that the Cassandra Heap filled up to the point that it 
 was spending 90% of its time doing garbage collection.  No errors were logged 
 in regards to the failed commit volume.  Gossip on other nodes in the cluster 
 eventually flagged the node as down.  Gossip on the local node showed itself 
 as up, and all other nodes as down.
 The most serious problem was that connections to the coordinator on this node 
 became very slow due to the on-going GC, as I assume uncommitted writes piled 
 up on the JVM heap.  What we believe should have happened is that Cassandra 
 should have caught the I/O error and exited with a useful log message, or 
 otherwise done some sort of useful cleanup.  Otherwise the node goes into a 
 sort of Zombie state, spending most of its time in GC, and thus slowing down 
 any transactions that happen to use the coordinator on said node.
 A limit on in-memory, unflushed writes before refusing requests may also 
 work.  Point being, something should be done to handle the commit volume 
 dying as doing nothing results in affecting the entire cluster.  I should 
 note, we are using: disk_failure_policy: best_effort



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-6364) There should be different disk_failure_policies for data and commit volumes or commit volume failure should always cause node exit

2013-11-20 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828064#comment-13828064
 ] 

Aleksey Yeschenko commented on CASSANDRA-6364:
--

I think I agree w/ stopping upon getting a write error on the commitlog drive.

 There should be different disk_failure_policies for data and commit volumes 
 or commit volume failure should always cause node exit
 --

 Key: CASSANDRA-6364
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6364
 Project: Cassandra
  Issue Type: Improvement
 Environment: JBOD, single dedicated commit disk
Reporter: J. Ryan Earl

 We're doing fault testing on a pre-production Cassandra cluster.  One of the 
 tests was to simulation failure of the commit volume/disk, which in our case 
 is on a dedicated disk.  We expected failure of the commit volume to be 
 handled somehow, but what we found was that no action was taken by Cassandra 
 when the commit volume fail.  We simulated this simply by pulling the 
 physical disk that backed the commit volume, which resulted in filesystem I/O 
 errors on the mount point.
 What then happened was that the Cassandra Heap filled up to the point that it 
 was spending 90% of its time doing garbage collection.  No errors were logged 
 in regards to the failed commit volume.  Gossip on other nodes in the cluster 
 eventually flagged the node as down.  Gossip on the local node showed itself 
 as up, and all other nodes as down.
 The most serious problem was that connections to the coordinator on this node 
 became very slow due to the on-going GC, as I assume uncommitted writes piled 
 up on the JVM heap.  What we believe should have happened is that Cassandra 
 should have caught the I/O error and exited with a useful log message, or 
 otherwise done some sort of useful cleanup.  Otherwise the node goes into a 
 sort of Zombie state, spending most of its time in GC, and thus slowing down 
 any transactions that happen to use the coordinator on said node.
 A limit on in-memory, unflushed writes before refusing requests may also 
 work.  Point being, something should be done to handle the commit volume 
 dying as doing nothing results in affecting the entire cluster.  I should 
 note, we are using: disk_failure_policy: best_effort



--
This message was sent by Atlassian JIRA
(v6.1#6144)