[ 
https://issues.apache.org/jira/browse/CASSANDRA-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021188#comment-13021188
 ] 

Peter Schuller commented on CASSANDRA-2405:
-------------------------------------------

The best solution I can think of is to populate the information on CF creation 
with the timestamp that represents the time the CF was created on the node. If 
the node was bootstrapped as usual, that would have happened after the local CF 
creation. If it was not (e.g. forcefully inserted into the ring), then some 
operator has explicitly made the choice of entering it into the ring 
"inconsistently" anyway so it doesn't matter.

If this is easy to do, I think it would make for a really clean solution from 
the point of view of the user. The nodetool command would always return valid 
data except if something is truly broken; not even a single edge case to deal 
with. Simplicity rocks for this type of thing (for writing a monitoring script 
to trigger an alarm).

If that's overkill/non-easy, I dunno - slight preference for throwing an 
exception just because I really dislike silent failures and returning an 
out-of-band integer seems more likely to go unnoticed if somehow it never 
changes because repair is *never* run, for example. I.e, either your monitoring 
script treats -1 as an error anyway (so it's no worse in terms of triggering 
the alarm unnecessarily than an exception), or it doesn't - in which case you 
have a silent failure mode in the case of perpetual lack of repair running.

> should expose 'time since last successful repair' for easier aes monitoring
> ---------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2405
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2405
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Peter Schuller
>            Assignee: Pavel Yaskevich
>            Priority: Minor
>             Fix For: 0.7.5
>
>         Attachments: CASSANDRA-2405.patch
>
>
> The practical implementation issues of actually ensuring repair runs is 
> somewhat of an undocumented/untreated issue.
> One hopefully low hanging fruit would be to at least expose the time since 
> last successful repair for a particular column family, to make it easier to 
> write a correct script to monitor for lack of repair in a non-buggy fashion.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to