[jira] [Comment Edited] (CASSANDRA-20363) Add option to set a custom FSErrorHandler

Stefan Miklosovic (Jira) Wed, 26 Feb 2025 10:22:03 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-20363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17930780#comment-17930780
 ]

Stefan Miklosovic edited comment on CASSANDRA-20363 at 2/26/25 6:11 PM:
------------------------------------------------------------------------

Well ... the primary motivation behind doing it like that is that there might 
be in theory different implementations of this interface and a user _might_ 
want to pass some configuration properties to that - which ParameterizedClass 
enabled. If we have just a simple system property, we will lose the ability to 
configure that.

Tommy also wrote:

{code}
    // if disk_failure_policy is stop and problem is disk unavailable schedule 
a task
    // that monitor when disk is available again and open gossip and transports
{code}

That got me thinking ... what does it actually mean "to schedule a task"? So an 
error comes to "handleFSError" method and then he does what? 1) how does he 
differentiate that disk is not available 2) how is actual scheduling done? If 
he just starts a thread by submitting that to some thread pool, then he will be 
looping in that thread until "the disk is back again"? How does he know what 
disk to look at? Should not this be configurable? Etc. etc ... 

To summarize, I think that what I am doing now is that I am "guessing" how 
Tommy's implementation would look like. I would appreciate if we had a full 
picture here so we can tailor the solution to Tommy's need, not doing something 
Tommy will have a hard time to integrate with.

[~tommy_s] would you mind to share with us how your custom implementation would 
look like? 

was (Author: smiklosovic):
Well ... the primary motivation behind doing it like that is that there might 
be in theory different implementations of this interface and a user _might_ 
want to pass some configuration properties to that - which ParameterizedClass 
enabled. If we have just a simple system property, we will lose the ability to 
configure that.

Tommy also wrote:

{code}
    // if disk_failure_policy is stop and problem is disk unavailable schedule 
a task
    // that monitor when disk is available again and open gossip and transports
{code}

That got me thinking ... what does it actually mean "to schedule a task"? So an 
error comes to "handleFSError" method and the he does what? 1) how does he 
differentiate that disk is not available 2) how is actual scheduling done? If 
he just starts a thread by submitting that to some thread pool, then he will be 
looping in that thread until "the disk is back again"? How does he know what 
disk to look at? Should not this be configurable? Etc. etc ... 

To summarize, I think that what I am doing now is that I am "guessing" how 
Tommy's implementation would look like. I would appreciate if we had a full 
picture here so we can tailor the solution to Tommy's need, not doing something 
Tommy will have a hard time to integrate with.

[~tommy_s] would you mind to share with us how your custom implementation would 
look like? 

> Add option to set a custom FSErrorHandler
> -----------------------------------------
>
>                 Key: CASSANDRA-20363
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20363
>             Project: Apache Cassandra
>          Issue Type: New Feature
>          Components: Legacy/Core
>            Reporter: Tommy Stendahl
>            Assignee: Tommy Stendahl
>            Priority: Normal
>             Fix For: 5.x
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Add java property to override the DefaultFSErrorHandler with a custom 
> implementation.
> The use case I am looking at is a customer deployment that are using network 
> disks and these can go off-line sometimes, I would like to use 
> "disk_failure_policy: stop" but automatically detect when the disk is on-line 
> again and just open gossip and transports so the nodes comes back UP without 
> triggering a restart of the node.

--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-20363) Add option to set a custom FSErrorHandler

Reply via email to