[ 
https://issues.apache.org/jira/browse/CASSANDRA-8126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated CASSANDRA-8126:
------------------------------------
    Description: 
Our disk failure modes are great in most circumstances, but there are a couple 
where they may not make sense.

Take the example of trying to snapshot your data on a node.  If permissions 
aren't set up properly, the snapshot may fail which triggers a disk failure 
which brings down the server.

On the other hand, if you're trying to truncate a table, it may make sense to 
bring down the node if it's unable to snapshot because it's unable to properly 
make a hardlink backup of the data that's getting deleted - which is the 
expectation.  This may be debatable.

Perhaps in certain cases we can simply throw obvious errors and not bring down 
the server.  In other cases, we should be clear about why we are bringing down 
the server - perhaps for specific cases like the second case, having a special 
output to indicate why it's going down.  I say special output because it's not 
obvious why truncate would bring down any nodes in their cluster.

  was:
Our disk failure modes are great in most circumstances, but there are a couple 
where they may not make sense.

Take the example of trying to snapshot your data on a node.  If permissions 
aren't set up properly, the snapshot may fail which triggers a disk failure 
which brings down the server.

On the other hand, if you're trying to truncate a table, it may make sense to 
bring down the node if it's unable to snapshot because it's unable to properly 
make a hardlink backup of the data that's getting deleted - which is the 
expectation.  This may be debatable.

Perhaps in certain cases we can simply throw obvious errors and not bring down 
the server.  In other cases, we should be clear about why we are bringing down 
the server - perhaps for specific cases like the second case, having a special 
output to indicate why it's going down.  I say special output because it's not 
obvious why truncate to bring down any nodes in their cluster.


> Review disk failure mode handling
> ---------------------------------
>
>                 Key: CASSANDRA-8126
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8126
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jeremy Hanna
>
> Our disk failure modes are great in most circumstances, but there are a 
> couple where they may not make sense.
> Take the example of trying to snapshot your data on a node.  If permissions 
> aren't set up properly, the snapshot may fail which triggers a disk failure 
> which brings down the server.
> On the other hand, if you're trying to truncate a table, it may make sense to 
> bring down the node if it's unable to snapshot because it's unable to 
> properly make a hardlink backup of the data that's getting deleted - which is 
> the expectation.  This may be debatable.
> Perhaps in certain cases we can simply throw obvious errors and not bring 
> down the server.  In other cases, we should be clear about why we are 
> bringing down the server - perhaps for specific cases like the second case, 
> having a special output to indicate why it's going down.  I say special 
> output because it's not obvious why truncate would bring down any nodes in 
> their cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to