[ https://issues.apache.org/jira/browse/CASSANDRA-7927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169707#comment-14169707 ]
Joshua McKenzie commented on CASSANDRA-7927: -------------------------------------------- The previously linked branch actually had a couple of problems with it I've resolved [here|https://github.com/josh-mckenzie/cassandra/compare/7927?expand=1]. Namely, the when I combined the checking for FSError / CorruptSSTableException in inspectThrowable I didn't check the Commit log failure policy in the DatabaseDescriptor and also wouldn't have been able to do so without augmenting the information passed in to indicate it originated in a CommitLog context. I think you were on the right track w/having an independent entry point for inspection of CommitLog errors - that way we can kill the JVM on *any* commit log errors without having to worry about the type of error thrown on the CommitLog operation. I did a few other things on this branch as well: # added an entry in CHANGES.txt # added assertion to CommitLogTest to confirm the _die actually worked # added a workaround for the fact that File.setWritable(false) on a directory fails on Windows (/sigh) # merged the KillerForTests into the JVMStabilityInspector to help keep the code-base clean # promoted the inspection in FileUtils and in CommitLog of the Throwable to the root of (handleFSError/handleCorruptSSTable/handleCommitError) so the inspector will immediately kill if appropriate > Kill daemon on any disk error > ----------------------------- > > Key: CASSANDRA-7927 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7927 > Project: Cassandra > Issue Type: New Feature > Components: Core > Environment: aws, stock cassandra or dse > Reporter: John Sumsion > Assignee: John Sumsion > Labels: bootcamp, lhf > Fix For: 2.1.1 > > Attachments: 7927-v1-die.patch > > > We got a disk read error on 1.2.13 that didn't trigger the disk failure > policy, and I'm trying to hunt down why, but in doing so, I saw that there is > no disk_failure_policy option for just killing the daemon. > If we ever get a corrupt sstable, we want to replace the node anyway, because > some aws instance store disks just go bad. > I want to use the JVMStabilityInspector from CASSANDRA-7507 to kill so that > remains standard, so I will base my patch on CASSANDRA-7507. -- This message was sent by Atlassian JIRA (v6.3.4#6332)