[
https://issues.apache.org/jira/browse/CASSANDRA-20363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17931422#comment-17931422
]
Stefan Miklosovic edited comment on CASSANDRA-20363 at 2/28/25 10:16 AM:
-------------------------------------------------------------------------
[~tommy_s] try to take a look at this (1)
(1) https://github.com/apache/cassandra/pull/3933/files
What I did there is that I extracted all "failure logic" to one class. So your
task would be to code up another implementation of DiskErrorObserver and act
accordingly. Everything will go to that instance so detecting multiple errors
coming from different methods and starting / stopping monitoring threads should
be way easier.
It makes the code clear as well if we just centralize all of that to one place.
Try to focus just on the implementation of that, we may always make it
configurable later on when you are done with your proof of concept.
cc [~brandon.williams]
was (Author: smiklosovic):
[~tommy_s] try to take a look at this (1)
(1) https://github.com/apache/cassandra/pull/3933/files
What I did there is that I extracted all "failure logic" to one class. So your
task would be to code up another implementation of DiskErrorObserver and act
accordingly. Everything will go to that instance so detecting multiple errors
coming from different methods and starting / stopping monitoring threads should
be way easier.
It makes the code clear as well if we just centralize all of that to one place.
cc [~brandon.williams]
> Introduce a robust way to intercept FSError and commit log errors
> -----------------------------------------------------------------
>
> Key: CASSANDRA-20363
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20363
> Project: Apache Cassandra
> Issue Type: New Feature
> Components: Legacy/Core
> Reporter: Tommy Stendahl
> Assignee: Tommy Stendahl
> Priority: Normal
> Fix For: 5.x
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Add java property to override the DefaultFSErrorHandler with a custom
> implementation.
> The use case I am looking at is a customer deployment that are using network
> disks and these can go off-line sometimes, I would like to use
> "disk_failure_policy: stop" but automatically detect when the disk is on-line
> again and just open gossip and transports so the nodes comes back UP without
> triggering a restart of the node.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]