[ https://issues.apache.org/jira/browse/CASSANDRA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012057#comment-13012057 ]
Gary Dusbabek commented on CASSANDRA-2394: ------------------------------------------ This probably belongs in 0.8. > Faulty hd kills cluster performance > ----------------------------------- > > Key: CASSANDRA-2394 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2394 > Project: Cassandra > Issue Type: Bug > Affects Versions: 0.7.4 > Reporter: Thibaut > Priority: Minor > Fix For: 0.7.5 > > > Hi, > About every week, a node from our main cluster (>100 nodes) has a faulty hd > (Listing the cassandra data storage directoy triggers an input/output error). > Whenever this occurs, I see many timeoutexceptions in our application on > various nodes which cause everything to run very very slowly. Keyrange scans > just timeout and will sometimes never succeed. If I stop cassandra on the > faulty node, everything runs normal again. > It would be great to have some kind of monitoring thread in cassandra which > marks a node as "down" if there are multiple read/write errors to the data > directories. A single faulty hd on 1 node shouldn't affect global cluster > performance. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira