[jira] [Updated] (CASSANDRA-17168) Don't block gossip when clearing snapshots for failing repairs

2022-01-17 Thread Marcus Eriksson (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-17168:

  Fix Version/s: 4.0.2
 4.1
 (was: 4.x)
 (was: 4.0.x)
  Since Version: 4.0
Source Control Link: 
https://github.com/apache/cassandra/commit/98e798f567368f826fc3a57ddb6cdc464e741fe3
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> Don't block gossip when clearing snapshots for failing repairs
> --
>
> Key: CASSANDRA-17168
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17168
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 4.0.2, 4.1
>
>
> We clear snapshots in the GossipTasks thread when a repair session fails due 
> to a replica shutting down. If there are many tables/repair sessions ongoing 
> this can take a long time. With enough tables being repaired at the same time 
> even checking if the snapshots exists can take long enough to mark nodes down.
> We should clear snapshots in a separate thread and add a flag to tell us 
> whether this repair session can have snapshots to avoid checking if the 
> directory exists.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-17168) Don't block gossip when clearing snapshots for failing repairs

2022-01-17 Thread Marcus Eriksson (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-17168:

Status: Ready to Commit  (was: Review In Progress)

> Don't block gossip when clearing snapshots for failing repairs
> --
>
> Key: CASSANDRA-17168
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17168
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 4.0.x, 4.x
>
>
> We clear snapshots in the GossipTasks thread when a repair session fails due 
> to a replica shutting down. If there are many tables/repair sessions ongoing 
> this can take a long time. With enough tables being repaired at the same time 
> even checking if the snapshots exists can take long enough to mark nodes down.
> We should clear snapshots in a separate thread and add a flag to tell us 
> whether this repair session can have snapshots to avoid checking if the 
> directory exists.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-17168) Don't block gossip when clearing snapshots for failing repairs

2021-11-30 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-17168:
--
Test and Documentation Plan: tests
 Status: Patch Available  (was: Open)

> Don't block gossip when clearing snapshots for failing repairs
> --
>
> Key: CASSANDRA-17168
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17168
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 4.0.x, 4.x
>
>
> We clear snapshots in the GossipTasks thread when a repair session fails due 
> to a replica shutting down. If there are many tables/repair sessions ongoing 
> this can take a long time. With enough tables being repaired at the same time 
> even checking if the snapshots exists can take long enough to mark nodes down.
> We should clear snapshots in a separate thread and add a flag to tell us 
> whether this repair session can have snapshots to avoid checking if the 
> directory exists.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-17168) Don't block gossip when clearing snapshots for failing repairs

2021-11-30 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-17168:
--
Reviewers: David Capwell, David Capwell  (was: David Capwell)
   David Capwell, David Capwell  (was: David Capwell)
   Status: Review In Progress  (was: Patch Available)

> Don't block gossip when clearing snapshots for failing repairs
> --
>
> Key: CASSANDRA-17168
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17168
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 4.0.x, 4.x
>
>
> We clear snapshots in the GossipTasks thread when a repair session fails due 
> to a replica shutting down. If there are many tables/repair sessions ongoing 
> this can take a long time. With enough tables being repaired at the same time 
> even checking if the snapshots exists can take long enough to mark nodes down.
> We should clear snapshots in a separate thread and add a flag to tell us 
> whether this repair session can have snapshots to avoid checking if the 
> directory exists.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-17168) Don't block gossip when clearing snapshots for failing repairs

2021-11-24 Thread Marcus Eriksson (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-17168:

 Bug Category: Parent values: Availability(12983)Level 1 values: 
Unavailable(12994)
   Complexity: Normal
  Component/s: Consistency/Repair
Discovered By: Adhoc Test
Fix Version/s: 4.0.x
   4.x
Reviewers: David Capwell
 Severity: Normal
   Status: Open  (was: Triage Needed)

trunk:
https://github.com/apache/cassandra/pull/1340
https://app.circleci.com/pipelines/github/krummas/cassandra?branch=marcuse%2F17168-trunk
4.0:
https://github.com/apache/cassandra/pull/1341 
https://app.circleci.com/pipelines/github/krummas/cassandra?branch=marcuse%2F17168

note that the trunk version includes a change to the PREPARE message to include 
repair parallelism instead of setting a flag on ParentRepairSession

> Don't block gossip when clearing snapshots for failing repairs
> --
>
> Key: CASSANDRA-17168
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17168
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 4.0.x, 4.x
>
>
> We clear snapshots in the GossipTasks thread when a repair session fails due 
> to a replica shutting down. If there are many tables/repair sessions ongoing 
> this can take a long time. With enough tables being repaired at the same time 
> even checking if the snapshots exists can take long enough to mark nodes down.
> We should clear snapshots in a separate thread and add a flag to tell us 
> whether this repair session can have snapshots to avoid checking if the 
> directory exists.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org