[ 
https://issues.apache.org/jira/browse/CASSANDRA-20363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17931422#comment-17931422
 ] 

Stefan Miklosovic edited comment on CASSANDRA-20363 at 2/28/25 10:16 AM:
-------------------------------------------------------------------------

[~tommy_s] try to take a look at this (1)

(1) https://github.com/apache/cassandra/pull/3933/files

What I did there is that I extracted all "failure logic" to one class. So your 
task would be to code up another implementation of DiskErrorObserver and act 
accordingly. Everything will go to that instance so detecting multiple errors 
coming from different methods and starting / stopping monitoring threads should 
be way easier.

It makes the code clear as well if we just centralize all of that to one place.

Try to focus just on the implementation of that, we may always make it 
configurable later on when you are done with your proof of concept.

cc [~brandon.williams]


was (Author: smiklosovic):
[~tommy_s] try to take a look at this (1)

(1) https://github.com/apache/cassandra/pull/3933/files

What I did there is that I extracted all "failure logic" to one class. So your 
task would be to code up another implementation of DiskErrorObserver and act 
accordingly. Everything will go to that instance so detecting multiple errors 
coming from different methods and starting / stopping monitoring threads should 
be way easier.

It makes the code clear as well if we just centralize all of that to one place.

cc [~brandon.williams]

> Introduce a robust way to intercept FSError and commit log errors
> -----------------------------------------------------------------
>
>                 Key: CASSANDRA-20363
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20363
>             Project: Apache Cassandra
>          Issue Type: New Feature
>          Components: Legacy/Core
>            Reporter: Tommy Stendahl
>            Assignee: Tommy Stendahl
>            Priority: Normal
>             Fix For: 5.x
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Add java property to override the DefaultFSErrorHandler with a custom 
> implementation.
> The use case I am looking at is a customer deployment that are using network 
> disks and these can go off-line sometimes, I would like to use 
> "disk_failure_policy: stop" but automatically detect when the disk is on-line 
> again and just open gossip and transports so the nodes comes back UP without 
> triggering a restart of the node.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to