[jira] [Commented] (CASSANDRA-16446) Parent repair sessions leak may lead to node long pauses

2021-12-01 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451934#comment-17451934
 ] 

David Capwell commented on CASSANDRA-16446:
---

Thanks, saw it while refactoring and wasn't sure if there was a good reason.  
What I see now is that both FINALIZE_COMMIT and CLEANUP touch different maps, 
so I don't see a clear conflict with IR so makes sense to me to cleanup on 
failure.

> Parent repair sessions leak may lead to node long pauses
> 
>
> Key: CASSANDRA-16446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16446
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-rc1, 4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> {{ActiveRepairService}} keeps  a map `parentRepairSessions`. If these 
> sessions leak, that map can grow to a size when a node restarts 
> {{ActiveRepairService.onRestart()}} triggers a cleanup of sessions that can 
> pause nodes in a cluster for a long time.
> The proposed solution is for repairs to cleanup these sessions on all nodes 
> on completion by sending a CLEANUP message to involved nodes. Tests rely on a 
> new {{parentRepairSessionsCount()}} method on the parent repair sessions 
> MBean to keep track of these.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16446) Parent repair sessions leak may lead to node long pauses

2021-11-30 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451511#comment-17451511
 ] 

Berenguer Blasi commented on CASSANDRA-16446:
-

[~dcapwell] I don't remember any specific reasons. Also reading the code 
diagonally I don't see a reason why we couldn't cleanup also on failures. But 
this is not a part of the code I know by heart so I guess the best is to give 
it a go and see what happens?

> Parent repair sessions leak may lead to node long pauses
> 
>
> Key: CASSANDRA-16446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16446
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-rc1, 4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> {{ActiveRepairService}} keeps  a map `parentRepairSessions`. If these 
> sessions leak, that map can grow to a size when a node restarts 
> {{ActiveRepairService.onRestart()}} triggers a cleanup of sessions that can 
> pause nodes in a cluster for a long time.
> The proposed solution is for repairs to cleanup these sessions on all nodes 
> on completion by sending a CLEANUP message to involved nodes. Tests rely on a 
> new {{parentRepairSessionsCount()}} method on the parent repair sessions 
> MBean to keep track of these.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16446) Parent repair sessions leak may lead to node long pauses

2021-11-30 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451288#comment-17451288
 ] 

David Capwell commented on CASSANDRA-16446:
---

I don't see conversation in JIRA or GH, was wondering why cleanup is success 
only and does not include failure?  Best I see is 
https://github.com/apache/cassandra/pull/896#discussion_r577334272

bq. Parent session is removed as part of the success() call path.

If a session is failed, we can't recover or really act on it...  Was the 
concern IR?

> Parent repair sessions leak may lead to node long pauses
> 
>
> Key: CASSANDRA-16446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16446
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-rc1, 4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> {{ActiveRepairService}} keeps  a map `parentRepairSessions`. If these 
> sessions leak, that map can grow to a size when a node restarts 
> {{ActiveRepairService.onRestart()}} triggers a cleanup of sessions that can 
> pause nodes in a cluster for a long time.
> The proposed solution is for repairs to cleanup these sessions on all nodes 
> on completion by sending a CLEANUP message to involved nodes. Tests rely on a 
> new {{parentRepairSessionsCount()}} method on the parent repair sessions 
> MBean to keep track of these.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16446) Parent repair sessions leak may lead to node long pauses

2021-02-24 Thread Jaroslaw Grabowski (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17290730#comment-17290730
 ] 

Jaroslaw Grabowski commented on CASSANDRA-16446:


Thank you [~Bereng]!

> Parent repair sessions leak may lead to node long pauses
> 
>
> Key: CASSANDRA-16446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16446
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-beta5, 4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> {{ActiveRepairService}} keeps  a map `parentRepairSessions`. If these 
> sessions leak, that map can grow to a size when a node restarts 
> {{ActiveRepairService.onRestart()}} triggers a cleanup of sessions that can 
> pause nodes in a cluster for a long time.
> The proposed solution is for repairs to cleanup these sessions on all nodes 
> on completion by sending a CLEANUP message to involved nodes. Tests rely on a 
> new {{parentRepairSessionsCount()}} method on the parent repair sessions 
> MBean to keep track of these.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16446) Parent repair sessions leak may lead to node long pauses

2021-02-24 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17290703#comment-17290703
 ] 

Berenguer Blasi commented on CASSANDRA-16446:
-

Thx for all the work :-)

> Parent repair sessions leak may lead to node long pauses
> 
>
> Key: CASSANDRA-16446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16446
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-beta5, 4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {{ActiveRepairService}} keeps  a map `parentRepairSessions`. If these 
> sessions leak, that map can grow to a size when a node restarts 
> {{ActiveRepairService.onRestart()}} triggers a cleanup of sessions that can 
> pause nodes in a cluster for a long time.
> The proposed solution is for repairs to cleanup these sessions on all nodes 
> on completion by sending a CLEANUP message to involved nodes. Tests rely on a 
> new {{parentRepairSessionsCount()}} method on the parent repair sessions 
> MBean to keep track of these.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16446) Parent repair sessions leak may lead to node long pauses

2021-02-24 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17290195#comment-17290195
 ] 

Andres de la Peña commented on CASSANDRA-16446:
---

Committed to {{trunk}} as 
[23512cf3da5e8206d8797841f2238cdd86c13d96|https://github.com/apache/cassandra/commit/23512cf3da5e8206d8797841f2238cdd86c13d96].

Dtests committed as 
[c89dea0e8c38ed35ed40d59c975a07585584a637|https://github.com/apache/cassandra-dtest/commit/c89dea0e8c38ed35ed40d59c975a07585584a637].

> Parent repair sessions leak may lead to node long pauses
> 
>
> Key: CASSANDRA-16446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16446
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0, 4.0-beta
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {{ActiveRepairService}} keeps  a map `parentRepairSessions`. If these 
> sessions leak, that map can grow to a size when a node restarts 
> {{ActiveRepairService.onRestart()}} triggers a cleanup of sessions that can 
> pause nodes in a cluster for a long time.
> The proposed solution is for repairs to cleanup these sessions on all nodes 
> on completion by sending a CLEANUP message to involved nodes. Tests rely on a 
> new {{parentRepairSessionsCount()}} method on the parent repair sessions 
> MBean to keep track of these.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16446) Parent repair sessions leak may lead to node long pauses

2021-02-24 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289937#comment-17289937
 ] 

Berenguer Blasi commented on CASSANDRA-16446:
-

Ah yes I didn't rebase. I just added the @jira_ticket & the reformatted line. 
Apologies I didn't get it you wanted it rebased.

> Parent repair sessions leak may lead to node long pauses
> 
>
> Key: CASSANDRA-16446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16446
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0, 4.0-beta
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {{ActiveRepairService}} keeps  a map `parentRepairSessions`. If these 
> sessions leak, that map can grow to a size when a node restarts 
> {{ActiveRepairService.onRestart()}} triggers a cleanup of sessions that can 
> pause nodes in a cluster for a long time.
> The proposed solution is for repairs to cleanup these sessions on all nodes 
> on completion by sending a CLEANUP message to involved nodes. Tests rely on a 
> new {{parentRepairSessionsCount()}} method on the parent repair sessions 
> MBean to keep track of these.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16446) Parent repair sessions leak may lead to node long pauses

2021-02-24 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289869#comment-17289869
 ] 

Andres de la Peña commented on CASSANDRA-16446:
---

I see that the PRs are not rebased, and CircleCI is pointing to a different 
dtest branch that seems identical except for the last {{@jira_ticket}} tags. I 
have rebased the branches (on my repo, 
[here|https://github.com/adelapena/cassandra/tree/CASSANDRA-16446-review] and 
[here|https://github.com/adelapena/cassandra-dtest/tree/CASSANDRA-16446-review])
 and I'm running that final CI round:
 * [circle 
j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/193/workflows/15564af1-1247-4b27-99f8-bb04c38b3ae6]
 * [circle 
j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/193/workflows/75dbbd96-ade0-4ef0-82d1-96aab0e68fe4]
 * 
[jenkins|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/399/pipeline]

> Parent repair sessions leak may lead to node long pauses
> 
>
> Key: CASSANDRA-16446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16446
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0, 4.0-beta
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {{ActiveRepairService}} keeps  a map `parentRepairSessions`. If these 
> sessions leak, that map can grow to a size when a node restarts 
> {{ActiveRepairService.onRestart()}} triggers a cleanup of sessions that can 
> pause nodes in a cluster for a long time.
> The proposed solution is for repairs to cleanup these sessions on all nodes 
> on completion by sending a CLEANUP message to involved nodes. Tests rely on a 
> new {{parentRepairSessionsCount()}} method on the parent repair sessions 
> MBean to keep track of these.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16446) Parent repair sessions leak may lead to node long pauses

2021-02-24 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289861#comment-17289861
 ] 

Andres de la Peña commented on CASSANDRA-16446:
---

Great, I'm running a final CI round 
[here|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/398/pipeline]
 after the rebase just in case, I can commit once it finishes.

> Parent repair sessions leak may lead to node long pauses
> 
>
> Key: CASSANDRA-16446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16446
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0, 4.0-beta
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {{ActiveRepairService}} keeps  a map `parentRepairSessions`. If these 
> sessions leak, that map can grow to a size when a node restarts 
> {{ActiveRepairService.onRestart()}} triggers a cleanup of sessions that can 
> pause nodes in a cluster for a long time.
> The proposed solution is for repairs to cleanup these sessions on all nodes 
> on completion by sending a CLEANUP message to involved nodes. Tests rely on a 
> new {{parentRepairSessionsCount()}} method on the parent repair sessions 
> MBean to keep track of these.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16446) Parent repair sessions leak may lead to node long pauses

2021-02-18 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17286857#comment-17286857
 ] 

Berenguer Blasi commented on CASSANDRA-16446:
-

I'd say it lgtm. There are some timeouts and then the 16411 failures waiting to 
be merged in.

> Parent repair sessions leak may lead to node long pauses
> 
>
> Key: CASSANDRA-16446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16446
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-beta5, 4.0, 4.0-beta
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {{ActiveRepairService}} keeps  a map `parentRepairSessions`. If these 
> sessions leak, that map can grow to a size when a node restarts 
> {{ActiveRepairService.onRestart()}} triggers a cleanup of sessions that can 
> pause nodes in a cluster for a long time.
> The proposed solution is for repairs to cleanup these sessions on all nodes 
> on completion by sending a CLEANUP message to involved nodes. Tests rely on a 
> new {{parentRepairSessionCount()}} method on the parent repair sessions MBean 
> to keep track of these.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16446) Parent repair sessions leak may lead to node long pauses

2021-02-18 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17286486#comment-17286486
 ] 

Ekaterina Dimitrova commented on CASSANDRA-16446:
-

New Jenkins CI run submitted [here | 
https://jenkins-cm4.apache.org/job/Cassandra-devbranch/387/]

> Parent repair sessions leak may lead to node long pauses
> 
>
> Key: CASSANDRA-16446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16446
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-beta5, 4.0, 4.0-beta
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {{ActiveRepairService}} keeps  a map `parentRepairSessions`. If these 
> sessions leak, that map can grow to a size when a node restarts 
> {{ActiveRepairService.onRestart()}} triggers a cleanup of sessions that can 
> pause nodes in a cluster for a long time.
> The proposed solution is for repairs to cleanup these sessions on all nodes 
> on completion by sending a CLEANUP message to involved nodes. Tests rely on a 
> new {{parentRepairSessionCount()}} method on the parent repair sessions MBean 
> to keep track of these.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16446) Parent repair sessions leak may lead to node long pauses

2021-02-17 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17286283#comment-17286283
 ] 

Berenguer Blasi commented on CASSANDRA-16446:
-

Ah it failed bc of the rename we did... A new jenkins run should be clean now.

> Parent repair sessions leak may lead to node long pauses
> 
>
> Key: CASSANDRA-16446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16446
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-beta5, 4.0, 4.0-beta
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {{ActiveRepairService}} keeps  a map `parentRepairSessions`. If these 
> sessions leak, that map can grow to a size when a node restarts 
> {{ActiveRepairService.onRestart()}} triggers a cleanup of sessions that can 
> pause nodes in a cluster for a long time.
> The proposed solution is for repairs to cleanup these sessions on all nodes 
> on completion by sending a CLEANUP message to involved nodes. Tests rely on a 
> new {{parentRepairSessionCount()}} method on the parent repair sessions MBean 
> to keep track of these.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16446) Parent repair sessions leak may lead to node long pauses

2021-02-17 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17286146#comment-17286146
 ] 

Ekaterina Dimitrova commented on CASSANDRA-16446:
-

Jenkins run pushed [here| 
https://jenkins-cm4.apache.org/job/Cassandra-devbranch/385/]

> Parent repair sessions leak may lead to node long pauses
> 
>
> Key: CASSANDRA-16446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16446
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-beta5, 4.0, 4.0-beta
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {{ActiveRepairService}} keeps  a map `parentRepairSessions`. If these 
> sessions leak, that map can grow to a size when a node restarts 
> {{ActiveRepairService.onRestart()}} triggers a cleanup of sessions that can 
> pause nodes in a cluster for a long time.
> The proposed solution is for repairs to cleanup these sessions on all nodes 
> on completion by sending a CLEANUP message to involved nodes. Tests rely on a 
> new {{parentRepairSessionCount()}} method on the parent repair sessions MBean 
> to keep track of these.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16446) Parent repair sessions leak may lead to node long pauses

2021-02-16 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285597#comment-17285597
 ] 

Ekaterina Dimitrova commented on CASSANDRA-16446:
-

I did a first pass, left a few small comments/questions but in general it looks 
good.

We need second reviewer. [~adelapena], you were revising the testing for repair 
if I recall correctly, do you think you will have time to check this one, too, 
please? 

> Parent repair sessions leak may lead to node long pauses
> 
>
> Key: CASSANDRA-16446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16446
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-beta5, 4.0, 4.0-beta
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {{ActiveRepairService}} keeps  a map `parentRepairSessions`. If these 
> sessions leak, that map can grow to a size when a node restarts 
> {{ActiveRepairService.onRestart()}} triggers a cleanup of sessions that can 
> pause nodes in a cluster for a long time.
> The proposed solution is for repairs to cleanup these sessions on all nodes 
> on completion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16446) Parent repair sessions leak may lead to node long pauses

2021-02-16 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285298#comment-17285298
 ] 

Ekaterina Dimitrova commented on CASSANDRA-16446:
-

No worries at all, just wanted to be sure you don't have in mind some new 
changes to add :) Thanks for confirming!

> Parent repair sessions leak may lead to node long pauses
> 
>
> Key: CASSANDRA-16446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16446
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-beta5, 4.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{ActiveRepairService}} keeps  a map `parentRepairSessions`. If these 
> sessions leak, that map can grow to a size when a node restarts 
> {{ActiveRepairService.onRestart()}} triggers a cleanup of sessions that can 
> pause nodes in a cluster for a long time.
> The proposed solution is for repairs to cleanup these sessions on all nodes 
> on completion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16446) Parent repair sessions leak may lead to node long pauses

2021-02-16 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285291#comment-17285291
 ] 

Berenguer Blasi commented on CASSANDRA-16446:
-

Mmmm yes it is. Weird I must have missed moving the status forward. Apologies.

> Parent repair sessions leak may lead to node long pauses
> 
>
> Key: CASSANDRA-16446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16446
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-beta5, 4.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{ActiveRepairService}} keeps  a map `parentRepairSessions`. If these 
> sessions leak, that map can grow to a size when a node restarts 
> {{ActiveRepairService.onRestart()}} triggers a cleanup of sessions that can 
> pause nodes in a cluster for a long time.
> The proposed solution is for repairs to cleanup these sessions on all nodes 
> on completion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16446) Parent repair sessions leak may lead to node long pauses

2021-02-16 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285257#comment-17285257
 ] 

Ekaterina Dimitrova commented on CASSANDRA-16446:
-

This is ready for review, right? As I see it still in status "work in progress"

> Parent repair sessions leak may lead to node long pauses
> 
>
> Key: CASSANDRA-16446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16446
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-beta5, 4.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{ActiveRepairService}} keeps  a map `parentRepairSessions`. If these 
> sessions leak, that map can grow to a size when a node restarts 
> {{ActiveRepairService.onRestart()}} triggers a cleanup of sessions that can 
> pause nodes in a cluster for a long time.
> The proposed solution is for repairs to cleanup these sessions on all nodes 
> on completion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16446) Parent repair sessions leak may lead to node long pauses

2021-02-15 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17284611#comment-17284611
 ] 

Berenguer Blasi commented on CASSANDRA-16446:
-

All praise and glory to [~jtgrabowski] for the original solution :-)

> Parent repair sessions leak may lead to node long pauses
> 
>
> Key: CASSANDRA-16446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16446
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-beta5, 4.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{ActiveRepairService}} keeps  a map `parentRepairSessions`. If these 
> sessions leak, that map can grow to a size when a node restarts 
> {{ActiveRepairService.onRestart()}} triggers a cleanup of sessions that can 
> pause nodes in a cluster for a long time.
> The proposed solution is for repairs to cleanup these sessions on all nodes 
> on completion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org