[ https://issues.apache.org/jira/browse/CASSANDRA-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15027675#comment-15027675 ]
Ariel Weisberg commented on CASSANDRA-10688: -------------------------------------------- Near as I can tell the stack overflow is being used as a bound for something that is walking an object graph looking for a path from the outgoing references of an object to itself doing a depth first search. That isn't a stack trace it's the graph that it walked (up until it overflowed). I suspect the overflow is due to the depth of the graph since it's depth first and an any moderately large linked list is going to overflow pretty quickly. It's also using Stack which extends Vector which we should probably replace with ArrayDeque. This is debug code that only runs if {{-Dcassandra.debugrefcount=true}} so this isn't an issue in production deployments. [~jjordan] any idea why that would be set in your experiment? For debug purposes the code works as designed and it can recover from the stack overflow and continue searching the graph. It prunes the graph at the point where the stack overflows. The only real issue is if the error is too noisy. I think we might want to rate limit it using the first N entries in the graph as a key. I'll put that together. > Stack overflow from SSTableReader$InstanceTidier.runOnClose in Leak Detector > ---------------------------------------------------------------------------- > > Key: CASSANDRA-10688 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10688 > Project: Cassandra > Issue Type: Bug > Reporter: Jeremiah Jordan > Assignee: Ariel Weisberg > Fix For: 3.0.1, 3.1 > > > Running some tests against cassandra-3.0 > 9fc957cf3097e54ccd72e51b2d0650dc3e83eae0 > The tests are just running cassandra-stress write and read while adding and > removing nodes from the cluster. After the test runs when I go back through > logs I find the following Stackoverflow fairly often: > ERROR [Strong-Reference-Leak-Detector:1] 2015-11-11 00:04:10,638 > Ref.java:413 - Stackoverflow [private java.lang.Runnable > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier.runOnClose, > final java.lang.Runnable > org.apache.cassandra.io.sstable.format.SSTableReader$DropPageCache.andThen, > final org.apache.cassandra.cache.InstrumentingCache > org.apache.cassandra.io.sstable.SSTableRewriter$InvalidateKeys.cache, private > final org.apache.cassandra.cache.ICache > org.apache.cassandra.cache.InstrumentingCache.map, private final > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap > org.apache.cassandra.cache.ConcurrentLinkedHashCache.map, final > com.googlecode.concurrentlinkedhashmap.LinkedDeque > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap.evictionDeque, > com.googlecode.concurrentlinkedhashmap.Linked > com.googlecode.concurrentlinkedhashmap.LinkedDeque.first, > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node > ....... (repeated a whole bunch more) .... > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, > final java.lang.Object > com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.key, > public final byte[] org.apache.cassandra.cache.KeyCacheKey.key -- This message was sent by Atlassian JIRA (v6.3.4#6332)