[ 
https://issues.apache.org/jira/browse/CASSANDRA-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-13873:
--------------------------------------
    Reproduced In: 3.11.0, 3.10, 4.0  (was: 3.10, 3.11.0, 4.0)
           Status: Patch Available  (was: In Progress)

It looks like this situation can occur when referencing canonical sstables. As 
far as I can tell, the issue reproduces only when we have an sstable in a 
lifecycle transaction with no referencers other than its selfref. If the 
lifecycle transaction updates this sstable, we'll put a new instance of the 
sstable reader in the tracker. This causes no problems when getting live 
sstables, but the canonical sstables can also include sstable readers from the 
compacting set. In this case, the sstable reader that got updated will still be 
in the compacting set, but we won't be able to reference it when we try to 
select and reference canonical sstables, since its instance tidier has run when 
its last ref was released in the lifecycle transaction. Note that the global 
tidier doesn't run, since the updated sstable reader is still referenced. With 
the reproduction provided above in the multiple scrub, the scrubs will 
eventually proceed once the lifecycle transaction finishes, since it will put 
an updated sstablereader in the tracker. If there is a situation where a 
lifecyce transaction needed to select canonical sstables to proceed, this could 
cause a deadlock.

I pushed a branch at 
[c13873-2.2|https://github.com/jkni/cassandra/commit/ba70f70d97f648037e742a16bfdf1c8002d2be9c]
 that implements the simplest fix I can think of. The patch references the 
original sstables involved in a lifecycle transaction when we create the 
transaction, releasing these references whenever we do postCleanup or cancel an 
sstable reader from a transaction. I merged this forward and tests came back 
clean on all active branches. I'm not sure if there is some existing mechanism 
that should cover this case - maybe [~krummas] knows from reviewing 
[CASSANDRA-9699]?

> Ref bug in Scrub
> ----------------
>
>                 Key: CASSANDRA-13873
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13873
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: T Jake Luciani
>            Assignee: Joel Knighton
>            Priority: Critical
>
> I'm hitting a Ref bug when many scrubs run against a node.  This doesn't 
> happen on 3.0.X.  I'm not sure if/if not this happens with compactions too 
> but I suspect it does.
> I'm not seeing any Ref leaks or double frees.
> To Reproduce:
> {quote}
> ./tools/bin/cassandra-stress write n=10m -rate threads=100
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> {quote}
> Eventually in the logs you get:
> WARN  [RMI TCP Connection(4)-127.0.0.1] 2017-09-14 15:51:26,722 
> NoSpamLogger.java:97 - Spinning trying to capture readers 
> [BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-5-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-32-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-31-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-29-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-27-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-26-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-20-big-Data.db')],
> *released: 
> [BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-5-big-Data.db')],*
>  
> This released table has a selfRef of 0 but is in the Tracker



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to