[ 
https://issues.apache.org/jira/browse/CASSANDRA-11547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245320#comment-15245320
 ] 

Sylvain Lebresne commented on CASSANDRA-11547:
----------------------------------------------

While I wouldn't say that I'm strongly against this, It does feel to me like 
this is more about "monitoring your system" than "monitoring C*", and I think 
we recently had some reasonable consensus on the fact that the former shouldn't 
be C* business. Overall, monitoring this externally should be pretty simple and 
probably much better since you can do so without having to care about GC 
pauses, isn't it?

I'll note in particular that I'm a little worried of having people be mislead 
thinking there is a problem with their clock when they really have a problem 
with their GC tuning (and while the patch tries to "avoid" this through some 
error tolerance, I also question how we can choose (or even figure out) what a 
good default for that tolerance is).

> Add background thread to check for clock drift
> ----------------------------------------------
>
>                 Key: CASSANDRA-11547
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11547
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jason Brown
>            Assignee: Jason Brown
>            Priority: Minor
>              Labels: clocks, time
>
> The system clock has the potential to drift while a system is running. As a 
> simple way to check if this occurs, we can run a background thread that wakes 
> up every n seconds, reads the system clock, and checks to see if, indeed, n 
> seconds have passed. 
> * If the clock's current time is less than the last recorded time (captured n 
> seconds in the past), we know the clock has jumped backward.
> * If n seconds have not elapsed, we know the system clock is running slow or 
> has moved backward (by a value less than n)
> * If (n + a small offset) seconds have elapsed, we can assume we are within 
> an acceptable window of clock movement. Reasons for including an offset are 
> the clock checking thread might not have been scheduled on time, or garbage 
> collection, and so on.
> * If the clock is greater than (n + a small offset) seconds, we can assume 
> the clock jumped forward.
> In the unhappy cases, we can write a message to the log and increment some 
> metric that the user's monitoring systems can trigger/alert on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to