[ 
https://issues.apache.org/jira/browse/CASSANDRA-11547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253982#comment-15253982
 ] 

Robert Stupp commented on CASSANDRA-11547:
------------------------------------------

bq. strong warning or even freeze

I'm not excited about freezing a node, if some {{if (clockDrift > X)}} 
triggers. This can (and in most installations will) lead to a complete outage 
of the cluster.

bq. warning ... out of sync with the majority of the cluster

Is it the majority (quorum?) of all nodes, of all live nodes, of all reachable 
nodes? I think that is way too complicated.

Issuing a warning as in this patch is absolutely fine IMO. If someone wants to 
freeze a node if such a warning is issued, it's still possible by monitoring 
the log file. It's also possible to send an alert by monitoring the log file 
(as many people already do : monitoring the log file for errors & warnings).

> Add background thread to check for clock drift
> ----------------------------------------------
>
>                 Key: CASSANDRA-11547
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11547
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jason Brown
>            Assignee: Jason Brown
>            Priority: Minor
>              Labels: clocks, time
>
> The system clock has the potential to drift while a system is running. As a 
> simple way to check if this occurs, we can run a background thread that wakes 
> up every n seconds, reads the system clock, and checks to see if, indeed, n 
> seconds have passed. 
> * If the clock's current time is less than the last recorded time (captured n 
> seconds in the past), we know the clock has jumped backward.
> * If n seconds have not elapsed, we know the system clock is running slow or 
> has moved backward (by a value less than n)
> * If (n + a small offset) seconds have elapsed, we can assume we are within 
> an acceptable window of clock movement. Reasons for including an offset are 
> the clock checking thread might not have been scheduled on time, or garbage 
> collection, and so on.
> * If the clock is greater than (n + a small offset) seconds, we can assume 
> the clock jumped forward.
> In the unhappy cases, we can write a message to the log and increment some 
> metric that the user's monitoring systems can trigger/alert on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to