[ https://issues.apache.org/jira/browse/CASSANDRA-11209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jose Fernandez updated CASSANDRA-11209: --------------------------------------- Attachment: screenshot-2.png > SSTable ancestor leaked reference > --------------------------------- > > Key: CASSANDRA-11209 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11209 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Reporter: Jose Fernandez > Attachments: screenshot-1.png, screenshot-2.png > > > We're running a fork of 2.1.13 that adds the TimeWindowCompactionStrategy > from [~jjirsa]. We've been running 4 clusters without any issues for many > months until a few weeks ago we started scheduling incremental repairs every > 24 hours (previously we didn't run any repairs at all). > Since then we started noticing big discrepancies in the LiveDiskSpaceUsed, > TotalDiskSpaceUsed, and actual size of files on disk. The numbers are brought > back in sync by restarting the node. We also noticed that when this bug > happens there are several ancestors that don't get cleaned up. A restart will > queue up a lot of compactions that slowly eat away the ancestors. > I looked at the code and noticed that we only decrease the LiveTotalDiskUsed > metric in the SSTableDeletingTask. Since we have no errors being logged, I'm > assuming that for some reason this task is not getting queued up. If I > understand correctly this only happens when the reference count for the > SStable reaches 0. So this is leading us to believe that something during > repairs and/or compactions is causing a reference leak to the ancestor table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)