[ https://issues.apache.org/jira/browse/CASSANDRA-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451511#comment-17451511 ]
Berenguer Blasi commented on CASSANDRA-16446: --------------------------------------------- [~dcapwell] I don't remember any specific reasons. Also reading the code diagonally I don't see a reason why we couldn't cleanup also on failures. But this is not a part of the code I know by heart so I guess the best is to give it a go and see what happens? > Parent repair sessions leak may lead to node long pauses > -------------------------------------------------------- > > Key: CASSANDRA-16446 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16446 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair > Reporter: Berenguer Blasi > Assignee: Berenguer Blasi > Priority: Normal > Fix For: 4.0-rc1, 4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > {{ActiveRepairService}} keeps a map `parentRepairSessions`. If these > sessions leak, that map can grow to a size when a node restarts > {{ActiveRepairService.onRestart()}} triggers a cleanup of sessions that can > pause nodes in a cluster for a long time. > The proposed solution is for repairs to cleanup these sessions on all nodes > on completion by sending a CLEANUP message to involved nodes. Tests rely on a > new {{parentRepairSessionsCount()}} method on the parent repair sessions > MBean to keep track of these. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org