[jira] [Updated] (CASSANDRA-16877) High priority internode messages which exceed the large message threshold are dropped

2021-08-25 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-16877:

  Fix Version/s: (was: 4.0.x)
 4.0.1
  Since Version: 4.0-alpha1
Source Control Link: 
https://github.com/apache/cassandra/commit/b8242730918c2e8edec83aeafeeae8255378125d
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Thanks. Agreed about the tests, so committed (with one nit addressed and one 
swerved) to 4.0 in {{b8242730918c2e8edec83aeafeeae8255378125d}} and merged up 
to trunk.

> High priority internode messages which exceed the large message threshold are 
> dropped
> -
>
> Key: CASSANDRA-16877
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16877
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 4.0.1
>
>
> Currently, there is an assumption that internode messages whose verb has 
> priority P0 will always fit within a single messaging frame. While this is 
> usually the case, on occasion it is possible that this assumption does not 
> hold. One example is gossip messages during the startup shadow round, where 
> in very large clusters the digest ack can contain all states for every peer. 
> In this scenario, the respondent fails to send the ack which may lead to the 
> shadow round and, ultimately, the startup failing.
>  
> We could tweak the shadow round acks to minimise the message size, but a more 
> robust solution would be to permit high priority messages to be sent on the 
> large messages connection when necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16877) High priority internode messages which exceed the large message threshold are dropped

2021-08-25 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-16877:

Status: Ready to Commit  (was: Review In Progress)

> High priority internode messages which exceed the large message threshold are 
> dropped
> -
>
> Key: CASSANDRA-16877
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16877
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 4.0.x
>
>
> Currently, there is an assumption that internode messages whose verb has 
> priority P0 will always fit within a single messaging frame. While this is 
> usually the case, on occasion it is possible that this assumption does not 
> hold. One example is gossip messages during the startup shadow round, where 
> in very large clusters the digest ack can contain all states for every peer. 
> In this scenario, the respondent fails to send the ack which may lead to the 
> shadow round and, ultimately, the startup failing.
>  
> We could tweak the shadow round acks to minimise the message size, but a more 
> robust solution would be to permit high priority messages to be sent on the 
> large messages connection when necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16877) High priority internode messages which exceed the large message threshold are dropped

2021-08-24 Thread Caleb Rackliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caleb Rackliffe updated CASSANDRA-16877:

Reviewers: Caleb Rackliffe, Caleb Rackliffe  (was: Caleb Rackliffe)
   Caleb Rackliffe, Caleb Rackliffe  (was: Caleb Rackliffe)
   Status: Review In Progress  (was: Patch Available)

> High priority internode messages which exceed the large message threshold are 
> dropped
> -
>
> Key: CASSANDRA-16877
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16877
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 4.0.x
>
>
> Currently, there is an assumption that internode messages whose verb has 
> priority P0 will always fit within a single messaging frame. While this is 
> usually the case, on occasion it is possible that this assumption does not 
> hold. One example is gossip messages during the startup shadow round, where 
> in very large clusters the digest ack can contain all states for every peer. 
> In this scenario, the respondent fails to send the ack which may lead to the 
> shadow round and, ultimately, the startup failing.
>  
> We could tweak the shadow round acks to minimise the message size, but a more 
> robust solution would be to permit high priority messages to be sent on the 
> large messages connection when necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16877) High priority internode messages which exceed the large message threshold are dropped

2021-08-23 Thread Caleb Rackliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caleb Rackliffe updated CASSANDRA-16877:

Reviewers: Caleb Rackliffe

> High priority internode messages which exceed the large message threshold are 
> dropped
> -
>
> Key: CASSANDRA-16877
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16877
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 4.0.x
>
>
> Currently, there is an assumption that internode messages whose verb has 
> priority P0 will always fit within a single messaging frame. While this is 
> usually the case, on occasion it is possible that this assumption does not 
> hold. One example is gossip messages during the startup shadow round, where 
> in very large clusters the digest ack can contain all states for every peer. 
> In this scenario, the respondent fails to send the ack which may lead to the 
> shadow round and, ultimately, the startup failing.
>  
> We could tweak the shadow round acks to minimise the message size, but a more 
> robust solution would be to permit high priority messages to be sent on the 
> large messages connection when necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16877) High priority internode messages which exceed the large message threshold are dropped

2021-08-23 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-16877:

Test and Documentation Plan: New unit test in patch
 Status: Patch Available  (was: In Progress)

This can be a fairly serious problem when cluster sizes run into hundreds of 
nodes. Following a restart, the shadow round can begin to fail as the 
respondants can't serialize the {{GossipDigestAck}}, leaving the sender unable 
to start. Even if the shadow round is skipped, the same problem often occurs 
when the starting node first sends a regular {{GossipDygestSyn}}. This may be 
an even worse scenario from an availability perspective as the restarted node 
will not establish contact with peers and will see the rest of the ring as 
down. 

The reason I didn't add more detail to the nospam log message is that doing so 
always feels to be of limited utility to me.  With nospam you're probably only 
seeing a tiny portion of the actual events, so you really need to enable the 
trace/debug logging anyway. If people have a strong opinion to the contrary 
though, I can always change this. 

||branch||Circle CI||Apache CI||
|[16877-4.0|https://github.com/beobal/cassandra/tree/16877-4.0]|[circle|https://circleci.com/gh/beobal/cassandra?branch=16877-4.0]|[apache|https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/1058]|


> High priority internode messages which exceed the large message threshold are 
> dropped
> -
>
> Key: CASSANDRA-16877
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16877
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 4.0.x
>
>
> Currently, there is an assumption that internode messages whose verb has 
> priority P0 will always fit within a single messaging frame. While this is 
> usually the case, on occasion it is possible that this assumption does not 
> hold. One example is gossip messages during the startup shadow round, where 
> in very large clusters the digest ack can contain all states for every peer. 
> In this scenario, the respondent fails to send the ack which may lead to the 
> shadow round and, ultimately, the startup failing.
>  
> We could tweak the shadow round acks to minimise the message size, but a more 
> robust solution would be to permit high priority messages to be sent on the 
> large messages connection when necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16877) High priority internode messages which exceed the large message threshold are dropped

2021-08-23 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-16877:

Bug Category: Parent values: Availability(12983)Level 1 values: Process 
Crash(12992)  (was: Parent values: Availability(12983)Level 1 values: Cluster 
Crash(12993))

> High priority internode messages which exceed the large message threshold are 
> dropped
> -
>
> Key: CASSANDRA-16877
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16877
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 4.0.x
>
>
> Currently, there is an assumption that internode messages whose verb has 
> priority P0 will always fit within a single messaging frame. While this is 
> usually the case, on occasion it is possible that this assumption does not 
> hold. One example is gossip messages during the startup shadow round, where 
> in very large clusters the digest ack can contain all states for every peer. 
> In this scenario, the respondent fails to send the ack which may lead to the 
> shadow round and, ultimately, the startup failing.
>  
> We could tweak the shadow round acks to minimise the message size, but a more 
> robust solution would be to permit high priority messages to be sent on the 
> large messages connection when necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16877) High priority internode messages which exceed the large message threshold are dropped

2021-08-23 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-16877:

 Bug Category: Parent values: Availability(12983)Level 1 values: Cluster 
Crash(12993)
   Complexity: Normal
Discovered By: Adhoc Test
Fix Version/s: 4.0.x
 Severity: Critical
   Status: Open  (was: Triage Needed)

> High priority internode messages which exceed the large message threshold are 
> dropped
> -
>
> Key: CASSANDRA-16877
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16877
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 4.0.x
>
>
> Currently, there is an assumption that internode messages whose verb has 
> priority P0 will always fit within a single messaging frame. While this is 
> usually the case, on occasion it is possible that this assumption does not 
> hold. One example is gossip messages during the startup shadow round, where 
> in very large clusters the digest ack can contain all states for every peer. 
> In this scenario, the respondent fails to send the ack which may lead to the 
> shadow round and, ultimately, the startup failing.
>  
> We could tweak the shadow round acks to minimise the message size, but a more 
> robust solution would be to permit high priority messages to be sent on the 
> large messages connection when necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org