[ 
https://issues.apache.org/jira/browse/CASSANDRA-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992973#comment-16992973
 ] 

Chris Kistner commented on CASSANDRA-14355:
-------------------------------------------

[~benedict],  thank you for the feedback.

After we've done some extensive investigation on our end, we pinned our issue 
to Cassandra Reaper terminating the Cassandra repair sessions if they don't 
finish within 30 minutes ("hangingRepairTimeoutMins" default), and then those 
repair threads aren't closed by Cassandra 3.11.4 and thus retain ~200MB of data 
on heap in our case. The other annoying part when this happens is that both 
Cassandra & Cassandra Reaper logs it as if the repair session was successful, 
unless you go and look at the logs in detail.

I think I should rather create a new bug for Cassandra not closing the Repair 
session thread correctly, since the original issue's 
"io.netty.util.concurrent.FastThreadLocalThread" class was referenced from 
"Native-Transport-Requests" and "ReadStage" threads, where as ours were all 
from "Repair" threads.

We might be able to see if we can reproduce the Cassandra issue where it is not 
closing Repair threads properly on 3.11.4 and then on 3.11.5. 
So far we've only experienced these OOME issues in client production 
environments and right now all our clients are in code/change freeze, so we 
won't be able to test 3.11.5 for about a month in a large production 
environment.
So for now we'll just schedule per host & per table repairs using cron scripts 
and doing a different host each day of the week instead of using Cassandra 
Reaper that might terminate our repair sessions. 

> Memory leak
> -----------
>
>                 Key: CASSANDRA-14355
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14355
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Legacy/Core
>         Environment: Debian Jessie, OpenJDK 1.8.0_151
>            Reporter: Eric Evans
>            Priority: Normal
>             Fix For: 3.11.x
>
>         Attachments: 01_Screenshot from 2018-04-04 14-24-00.png, 
> 02_Screenshot from 2018-04-04 14-28-33.png, 03_Screenshot from 2018-04-04 
> 14-24-50.png, LongGC_Dominator-Tree.png, LongGC_Histogram.png, 
> LongGC_Problem-Suspect-1_FastThreadLocalThread.png, LongGC_nodetool_info.txt
>
>
> We're seeing regular, frequent {{OutOfMemoryError}} exceptions.  Similar to 
> CASSANDRA-13754, an analysis of the heap dumps shows the heap consumed by the 
> {{threadLocals}} member of the instances of 
> {{io.netty.util.concurrent.FastThreadLocalThread}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to