Faulty hd kills cluster performance ----------------------------------- Key: CASSANDRA-2394 URL: https://issues.apache.org/jira/browse/CASSANDRA-2394 Project: Cassandra Issue Type: Bug Affects Versions: 0.7.4 Reporter: Thibaut Priority: Minor
Hi, About every week, a node from our main cluster (>100 nodes) has a faulty hd (Listing the cassandra data storage directoy triggers an input/output error). Whenever this occurs, I see many timeoutexceptions in our application on various nodes which cause everything to run very very slowly. Keyrange scans just timeout and will sometimes never succeed. If I stop cassandra on the faulty node, everything runs normal again. It would be great to have some kind of monitoring thread in cassandra which marks a node as "down" if there are multiple read/write errors to the data directories. A single faulty hd on 1 node shouldn't affect global cluster performance. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira