[ https://issues.apache.org/jira/browse/CASSANDRA-13875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16167537#comment-16167537 ]
Sylvain Lebresne commented on CASSANDRA-13875: ---------------------------------------------- bq. which defeats the purpose of CASSANDRA-9793 I'm going to argue that this is not true, and for what it's worth, [I did justify the change and why it wasn't, imo, defeating CASSANDRA-9793 on CASSANDRA-12791|https://issues.apache.org/jira/browse/CASSANDRA-12791?focusedCommentId=15621787&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15621787]. But to re-iter it here, the purpose of CASSANDRA-9793, as I understand it, was to make it easy for users to know when they see a lots of dropped messages, if it may simply be due to them having {{cross_node_timeout=true}} and clock skew between nodes making us drop messages that shouldn't really be dropped. This is still very much possible: if you see lots of cross-node dropped messages and little internal ones, then check your yaml to see if {{cross_node_timeout}} is set, and if it is, _boom_, you know the problem. No purpose is lost. On the other side, if {{cross_node_timeout=false}}, you used to get no information whatsoever from the "cross-node" part of the log message since it was always 0. Now, you do get at least some info as you always get distinct counts for message internally delivered and those that have crossed nodes. And, yet again, _without_ real loss of information since you can always check your yaml to see if a large amount of cross-node dropped messages may be due to clock skew between nodes. I also want to note that the message we log is: {noformat} M messages were dropped in last N ms: X internal and Y cross node. {noformat} and I'm still 100% convinced that any user seeing this for the first time will intuitively understand "and Y cross node" as being "the count of dropped messages that were not internally delivered" (as is always the case post-CASSANDRA-12791), *not* "the count of dropped messages that were not internally delivered _if_ cross_node_timeout is set to true, but otherwise a useless value that is always 0" (as was the case pre-CASSANDRA-12791). So a motivation of the change was also to make things a bit more intuitive. All this being said, it is very true that one thing that was "lost" is that you cannot from the log message *alone* decide if a large number of dropped message is almost surely due to clock skew, you have to cross-reference the info with your yaml setting for that. As I mention in CASSANDRA-12791, I felt this was ok because even the reporter of CASSANDRA-9793 seemed to be fine with [deriving things from the yaml|https://issues.apache.org/jira/browse/CASSANDRA-9793?focusedCommentId=14635441&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14635441]. But I'm more than happy to update the log message to fix that and generally make things more explicit. For instance, we could, when {{cross_node_timeout}} is {{true}}, add some warning to the message so it looks like: {noformat} M messages were dropped in last N ms: X internal and Y cross node (cross_node_timeout is true in cassandra.yaml so a large amount of cross node dropped message may be indicative of clock skew between nodes of the cluster, especially if the number of internal dropped message is low; if you are not running ntp on your nodes, we advise you do, but if that doesn't help, we advise turning cross_node_timeout off). {noformat} > cross node timeout logging is incorrect > --------------------------------------- > > Key: CASSANDRA-13875 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13875 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Brandon Williams > Assignee: Joel Knighton > > In CASSANDRA-12791 we changed some logic here, and what I observe now is logs > indicating cross-node timeouts, when in fact cross_node_timeout is set to > false, which defeats the purpose of CASSANDRA-9793. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org