[ 
https://issues.apache.org/jira/browse/CASSANDRA-11209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15160323#comment-15160323
 ] 

Jeff Jirsa commented on CASSANDRA-11209:
----------------------------------------

Similar to CASSANDRA-10510 as well 

> SSTable ancestor leaked reference
> ---------------------------------
>
>                 Key: CASSANDRA-11209
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11209
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Compaction
>            Reporter: Jose Fernandez
>            Assignee: Marcus Eriksson
>         Attachments: screenshot-1.png, screenshot-2.png
>
>
> We're running a fork of 2.1.13 that adds the TimeWindowCompactionStrategy 
> from [~jjirsa]. We've been running 4 clusters without any issues for many 
> months until a few weeks ago we started scheduling incremental repairs every 
> 24 hours (previously we didn't run any repairs at all).
> Since then we started noticing big discrepancies in the LiveDiskSpaceUsed, 
> TotalDiskSpaceUsed, and actual size of files on disk. The numbers are brought 
> back in sync by restarting the node. We also noticed that when this bug 
> happens there are several ancestors that don't get cleaned up. A restart will 
> queue up a lot of compactions that slowly eat away the ancestors.
> I looked at the code and noticed that we only decrease the LiveTotalDiskUsed 
> metric in the SSTableDeletingTask. Since we have no errors being logged, I'm 
> assuming that for some reason this task is not getting queued up. If I 
> understand correctly this only happens when the reference count for the 
> SStable reaches 0. So this is leading us to believe that something during 
> repairs and/or compactions is causing a reference leak to the ancestor table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to