[jira] [Commented] (CASSANDRA-7927) Kill daemon on any disk error

Joshua McKenzie (JIRA) Thu, 09 Oct 2014 12:26:49 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14165600#comment-14165600
 ]


Joshua McKenzie commented on CASSANDRA-7927:
--------------------------------------------

Sorry for the delay on this -  I have a version rebased to 2.1 head [available 
here|https://github.com/josh-mckenzie/cassandra/compare/7927?expand=1]
* Added support for "die" policy to CommitLog exception handling
* Removed 'killMeNow' method in StabilityInspector
* Migrated the FileUtil killing logic into the StabilityInspector
* Slight refactor on JVMStabilityInspector to keep it single-point-of-entry 
(hand Throwable to it, let it deal with it)
* Updated the unit tests to work w/the new structure
* Removed erroneous added entries from Config.CommitFailurePolicy
* Reverted ordering on enums in Config to just append the new entry on the end

Regarding migrating the logic into the JVMStabilityInspector: I expect we're 
going to have very few exception conditions that will cause us to mark the JVM 
as unstable and kill it, so I'd prefer to keep that class as simple as possible 
and nest that logic inside it rather than distributing it throughout by opening 
the interface to a 'killMeNow' type method.  Hand a throwable to it, it'll kill 
things if they need to be killed.

[~jdsumsion]: could you review the revised branch posted above?  Thanks!

> Kill daemon on any disk error
> -----------------------------
>
>                 Key: CASSANDRA-7927
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7927
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>         Environment: aws, stock cassandra or dse
>            Reporter: John Sumsion
>            Assignee: John Sumsion
>              Labels: bootcamp, lhf
>             Fix For: 2.1.1
>
>         Attachments: 7927-v1-die.patch
>
>
> We got a disk read error on 1.2.13 that didn't trigger the disk failure 
> policy, and I'm trying to hunt down why, but in doing so, I saw that there is 
> no disk_failure_policy option for just killing the daemon.
> If we ever get a corrupt sstable, we want to replace the node anyway, because 
> some aws instance store disks just go bad.
> I want to use the JVMStabilityInspector from CASSANDRA-7507 to kill so that 
> remains standard, so I will base my patch on CASSANDRA-7507.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7927) Kill daemon on any disk error

Reply via email to