[ 
https://issues.apache.org/jira/browse/CASSANDRA-11209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173695#comment-15173695
 ] 

Marcus Eriksson commented on CASSANDRA-11209:
---------------------------------------------

bq. This sounds like you should not only avoid scheduling repairs on a node 
that's already running them, but also on both its adjacent nodes, in order to 
avoid hitting the SSTable leak bug.
correct, avoid running repair on all nodes that store any of the ranges the 
repairing node is storing

> SSTable ancestor leaked reference
> ---------------------------------
>
>                 Key: CASSANDRA-11209
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11209
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Compaction
>            Reporter: Jose Fernandez
>            Assignee: Marcus Eriksson
>         Attachments: screenshot-1.png, screenshot-2.png
>
>
> We're running a fork of 2.1.13 that adds the TimeWindowCompactionStrategy 
> from [~jjirsa]. We've been running 4 clusters without any issues for many 
> months until a few weeks ago we started scheduling incremental repairs every 
> 24 hours (previously we didn't run any repairs at all).
> Since then we started noticing big discrepancies in the LiveDiskSpaceUsed, 
> TotalDiskSpaceUsed, and actual size of files on disk. The numbers are brought 
> back in sync by restarting the node. We also noticed that when this bug 
> happens there are several ancestors that don't get cleaned up. A restart will 
> queue up a lot of compactions that slowly eat away the ancestors.
> I looked at the code and noticed that we only decrease the LiveTotalDiskUsed 
> metric in the SSTableDeletingTask. Since we have no errors being logged, I'm 
> assuming that for some reason this task is not getting queued up. If I 
> understand correctly this only happens when the reference count for the 
> SStable reaches 0. So this is leading us to believe that something during 
> repairs and/or compactions is causing a reference leak to the ancestor table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to