[ 
https://issues.apache.org/jira/browse/CASSANDRA-13875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16167537#comment-16167537
 ] 

Sylvain Lebresne commented on CASSANDRA-13875:
----------------------------------------------

bq. which defeats the purpose of CASSANDRA-9793

I'm going to argue that this is not true, and for what it's worth, [I did 
justify the change and why it wasn't, imo, defeating CASSANDRA-9793 on 
CASSANDRA-12791|https://issues.apache.org/jira/browse/CASSANDRA-12791?focusedCommentId=15621787&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15621787].
 But to re-iter it here, the purpose of CASSANDRA-9793, as I understand it, was 
to make it easy for users to know when they see a lots of dropped messages, if 
it may simply be due to them having {{cross_node_timeout=true}} and clock skew 
between nodes making us drop messages that shouldn't really be dropped. This is 
still very much possible: if you see lots of cross-node dropped messages and 
little internal ones, then check your yaml to see if {{cross_node_timeout}} is 
set, and if it is, _boom_, you know the problem. No purpose is lost.

On the other side, if {{cross_node_timeout=false}}, you used to get no 
information whatsoever from the "cross-node" part of the log message since it 
was always 0. Now, you do get at least some info as you always get distinct 
counts for message internally delivered and those that have crossed nodes. And, 
yet again, _without_ real loss of information since you can always check your 
yaml to see if a large amount of cross-node dropped messages may be due to 
clock skew between nodes. I also want to note that the message we log is:
{noformat}
M messages were dropped in last N ms: X internal and Y cross node.
{noformat}
and I'm still 100% convinced that any user seeing this for the first time will 
intuitively understand "and Y cross node" as being "the count of dropped 
messages that were not internally delivered" (as is always the case 
post-CASSANDRA-12791), *not* "the count of dropped messages that were not 
internally delivered _if_ cross_node_timeout is set to true, but otherwise a 
useless value that is always 0" (as was the case pre-CASSANDRA-12791). So a 
motivation of the change was also to make things a bit more intuitive.

All this being said, it is very true that one thing that was "lost" is that you 
cannot from the log message *alone* decide if a large number of dropped message 
is almost surely due to clock skew, you have to cross-reference the info with 
your yaml setting for that. As I mention in CASSANDRA-12791, I felt this was ok 
because even the reporter of CASSANDRA-9793 seemed to be fine with [deriving 
things from the 
yaml|https://issues.apache.org/jira/browse/CASSANDRA-9793?focusedCommentId=14635441&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14635441].
 But I'm more than happy to update the log message to fix that and generally 
make things more explicit. For instance, we could, when {{cross_node_timeout}} 
is {{true}}, add some warning to the message so it looks like:
{noformat}
M messages were dropped in last N ms: X internal and Y cross node 
(cross_node_timeout is true in cassandra.yaml so a large amount of cross node 
dropped message may be indicative of clock skew between nodes of the cluster, 
especially if the number of internal dropped message is low; if you are not 
running ntp on your nodes, we advise you do, but if that doesn't help, we 
advise turning cross_node_timeout off).
{noformat}

> cross node timeout logging is incorrect
> ---------------------------------------
>
>                 Key: CASSANDRA-13875
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13875
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Joel Knighton
>
> In CASSANDRA-12791 we changed some logic here, and what I observe now is logs 
> indicating cross-node timeouts, when in fact cross_node_timeout is set to 
> false, which defeats the purpose of CASSANDRA-9793.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to