[ 
https://issues.apache.org/jira/browse/CASSANDRA-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-9793:
----------------------------------------
    Description: 
When a node has clock skew and cross node timeouts are enabled, there's no 
indication that the messages were dropped due to the cross timeout, just that 
messages were dropped.  This can errantly lead you down a path of 
troubleshooting a load shedding situation when really you just have clock drift 
on one node.  This is also not simple to troubleshoot, since you have to 
determine that this node will answer requests, but other nodes won't answer 
requests from it.  If the problem goes away on a reboot (and the machine does 
one-shot time sync, not continuous) it becomes even harder to detect because 
you're left with a weird piece of evidence such as "it's fine after a reboot, 
but comes back in about X days every time."

It would help tremendously if there were a log message indicating how many 
messages (don't need them broken down by type) were eagerly dropped due to the 
cross node timeout.

  was:
When a node has clock skew and cross node timeouts are enabled, there's no 
indication that the messages were dropped due to the cross timeout, just that 
messages were dropped.  This can errantly lead you down a path of 
troubleshooting a load shedding situation when really you just have clock drift 
on one node.  This is also not simple to troubleshoot, since you have to 
determine that this node will answer requests, but other nodes won't answer 
requests from it.  If the problem goes away on a reboot (and the machine does 
one-shot time sync, not continuos) it becomes even harder to detect because 
you're left with a weird piece of evidence such as "it's fine after a reboot, 
but comes back in about X days every time."

It would help tremendously if there were a log message indicating how many 
messages (don't need them broken down by type) were eagerly dropped due to the 
cross node timeout.


> Log when messages are dropped due to cross_node_timeout
> -------------------------------------------------------
>
>                 Key: CASSANDRA-9793
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9793
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Brandon Williams
>             Fix For: 2.1.x, 2.0.x
>
>
> When a node has clock skew and cross node timeouts are enabled, there's no 
> indication that the messages were dropped due to the cross timeout, just that 
> messages were dropped.  This can errantly lead you down a path of 
> troubleshooting a load shedding situation when really you just have clock 
> drift on one node.  This is also not simple to troubleshoot, since you have 
> to determine that this node will answer requests, but other nodes won't 
> answer requests from it.  If the problem goes away on a reboot (and the 
> machine does one-shot time sync, not continuous) it becomes even harder to 
> detect because you're left with a weird piece of evidence such as "it's fine 
> after a reboot, but comes back in about X days every time."
> It would help tremendously if there were a log message indicating how many 
> messages (don't need them broken down by type) were eagerly dropped due to 
> the cross node timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to