[jira] [Updated] (CASSANDRA-17168) Don't block gossip when clearing snapshots for failing repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-17168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-17168: Fix Version/s: 4.0.2 4.1 (was: 4.x) (was: 4.0.x) Since Version: 4.0 Source Control Link: https://github.com/apache/cassandra/commit/98e798f567368f826fc3a57ddb6cdc464e741fe3 Resolution: Fixed Status: Resolved (was: Ready to Commit) > Don't block gossip when clearing snapshots for failing repairs > -- > > Key: CASSANDRA-17168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17168 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 4.0.2, 4.1 > > > We clear snapshots in the GossipTasks thread when a repair session fails due > to a replica shutting down. If there are many tables/repair sessions ongoing > this can take a long time. With enough tables being repaired at the same time > even checking if the snapshots exists can take long enough to mark nodes down. > We should clear snapshots in a separate thread and add a flag to tell us > whether this repair session can have snapshots to avoid checking if the > directory exists. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17168) Don't block gossip when clearing snapshots for failing repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-17168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-17168: Status: Ready to Commit (was: Review In Progress) > Don't block gossip when clearing snapshots for failing repairs > -- > > Key: CASSANDRA-17168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17168 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 4.0.x, 4.x > > > We clear snapshots in the GossipTasks thread when a repair session fails due > to a replica shutting down. If there are many tables/repair sessions ongoing > this can take a long time. With enough tables being repaired at the same time > even checking if the snapshots exists can take long enough to mark nodes down. > We should clear snapshots in a separate thread and add a flag to tell us > whether this repair session can have snapshots to avoid checking if the > directory exists. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17168) Don't block gossip when clearing snapshots for failing repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-17168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Capwell updated CASSANDRA-17168: -- Test and Documentation Plan: tests Status: Patch Available (was: Open) > Don't block gossip when clearing snapshots for failing repairs > -- > > Key: CASSANDRA-17168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17168 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 4.0.x, 4.x > > > We clear snapshots in the GossipTasks thread when a repair session fails due > to a replica shutting down. If there are many tables/repair sessions ongoing > this can take a long time. With enough tables being repaired at the same time > even checking if the snapshots exists can take long enough to mark nodes down. > We should clear snapshots in a separate thread and add a flag to tell us > whether this repair session can have snapshots to avoid checking if the > directory exists. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17168) Don't block gossip when clearing snapshots for failing repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-17168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Capwell updated CASSANDRA-17168: -- Reviewers: David Capwell, David Capwell (was: David Capwell) David Capwell, David Capwell (was: David Capwell) Status: Review In Progress (was: Patch Available) > Don't block gossip when clearing snapshots for failing repairs > -- > > Key: CASSANDRA-17168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17168 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 4.0.x, 4.x > > > We clear snapshots in the GossipTasks thread when a repair session fails due > to a replica shutting down. If there are many tables/repair sessions ongoing > this can take a long time. With enough tables being repaired at the same time > even checking if the snapshots exists can take long enough to mark nodes down. > We should clear snapshots in a separate thread and add a flag to tell us > whether this repair session can have snapshots to avoid checking if the > directory exists. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17168) Don't block gossip when clearing snapshots for failing repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-17168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-17168: Bug Category: Parent values: Availability(12983)Level 1 values: Unavailable(12994) Complexity: Normal Component/s: Consistency/Repair Discovered By: Adhoc Test Fix Version/s: 4.0.x 4.x Reviewers: David Capwell Severity: Normal Status: Open (was: Triage Needed) trunk: https://github.com/apache/cassandra/pull/1340 https://app.circleci.com/pipelines/github/krummas/cassandra?branch=marcuse%2F17168-trunk 4.0: https://github.com/apache/cassandra/pull/1341 https://app.circleci.com/pipelines/github/krummas/cassandra?branch=marcuse%2F17168 note that the trunk version includes a change to the PREPARE message to include repair parallelism instead of setting a flag on ParentRepairSession > Don't block gossip when clearing snapshots for failing repairs > -- > > Key: CASSANDRA-17168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17168 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 4.0.x, 4.x > > > We clear snapshots in the GossipTasks thread when a repair session fails due > to a replica shutting down. If there are many tables/repair sessions ongoing > this can take a long time. With enough tables being repaired at the same time > even checking if the snapshots exists can take long enough to mark nodes down. > We should clear snapshots in a separate thread and add a flag to tell us > whether this repair session can have snapshots to avoid checking if the > directory exists. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org