[ https://issues.apache.org/jira/browse/CASSANDRA-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joel Knighton updated CASSANDRA-13873: -------------------------------------- Reproduced In: 3.11.0, 3.10, 4.0 (was: 3.10, 3.11.0, 4.0) Status: Patch Available (was: In Progress) It looks like this situation can occur when referencing canonical sstables. As far as I can tell, the issue reproduces only when we have an sstable in a lifecycle transaction with no referencers other than its selfref. If the lifecycle transaction updates this sstable, we'll put a new instance of the sstable reader in the tracker. This causes no problems when getting live sstables, but the canonical sstables can also include sstable readers from the compacting set. In this case, the sstable reader that got updated will still be in the compacting set, but we won't be able to reference it when we try to select and reference canonical sstables, since its instance tidier has run when its last ref was released in the lifecycle transaction. Note that the global tidier doesn't run, since the updated sstable reader is still referenced. With the reproduction provided above in the multiple scrub, the scrubs will eventually proceed once the lifecycle transaction finishes, since it will put an updated sstablereader in the tracker. If there is a situation where a lifecyce transaction needed to select canonical sstables to proceed, this could cause a deadlock. I pushed a branch at [c13873-2.2|https://github.com/jkni/cassandra/commit/ba70f70d97f648037e742a16bfdf1c8002d2be9c] that implements the simplest fix I can think of. The patch references the original sstables involved in a lifecycle transaction when we create the transaction, releasing these references whenever we do postCleanup or cancel an sstable reader from a transaction. I merged this forward and tests came back clean on all active branches. I'm not sure if there is some existing mechanism that should cover this case - maybe [~krummas] knows from reviewing [CASSANDRA-9699]? > Ref bug in Scrub > ---------------- > > Key: CASSANDRA-13873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13873 > Project: Cassandra > Issue Type: Bug > Reporter: T Jake Luciani > Assignee: Joel Knighton > Priority: Critical > > I'm hitting a Ref bug when many scrubs run against a node. This doesn't > happen on 3.0.X. I'm not sure if/if not this happens with compactions too > but I suspect it does. > I'm not seeing any Ref leaks or double frees. > To Reproduce: > {quote} > ./tools/bin/cassandra-stress write n=10m -rate threads=100 > ./bin/nodetool scrub > #Ctrl-C > ./bin/nodetool scrub > #Ctrl-C > ./bin/nodetool scrub > #Ctrl-C > ./bin/nodetool scrub > {quote} > Eventually in the logs you get: > WARN [RMI TCP Connection(4)-127.0.0.1] 2017-09-14 15:51:26,722 > NoSpamLogger.java:97 - Spinning trying to capture readers > [BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-5-big-Data.db'), > > BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-32-big-Data.db'), > > BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-31-big-Data.db'), > > BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-29-big-Data.db'), > > BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-27-big-Data.db'), > > BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-26-big-Data.db'), > > BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-20-big-Data.db')], > *released: > [BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-5-big-Data.db')],* > > This released table has a selfRef of 0 but is in the Tracker -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org