[ 
https://issues.apache.org/jira/browse/CASSANDRA-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15027675#comment-15027675
 ] 

Ariel Weisberg commented on CASSANDRA-10688:
--------------------------------------------

Near as I can tell the stack overflow is being used as a bound for something 
that is walking an object graph looking for a path from the outgoing references 
of an object to itself doing a depth first search. That isn't a stack trace 
it's the graph that it walked (up until it overflowed). I suspect the overflow 
is due to the depth of the graph since it's depth first and an any moderately 
large linked list is going to overflow pretty quickly.

It's also using Stack which extends Vector which we should probably replace 
with ArrayDeque.

This is debug code that only runs if {{-Dcassandra.debugrefcount=true}} so this 
isn't an issue in production deployments. [~jjordan] any idea why that would be 
set in your experiment?

For debug purposes the code works as designed and it can recover from the stack 
overflow and continue searching the graph. It prunes the graph at the point 
where the stack overflows. The only real issue is if the error is too noisy.

I think we might want to rate limit it using the first N entries in the graph 
as a key. I'll put that together.

> Stack overflow from SSTableReader$InstanceTidier.runOnClose in Leak Detector
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-10688
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10688
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jeremiah Jordan
>            Assignee: Ariel Weisberg
>             Fix For: 3.0.1, 3.1
>
>
> Running some tests against cassandra-3.0 
> 9fc957cf3097e54ccd72e51b2d0650dc3e83eae0
> The tests are just running cassandra-stress write and read while adding and 
> removing nodes from the cluster.  After the test runs when I go back through 
> logs I find the following Stackoverflow fairly often:
> ERROR [Strong-Reference-Leak-Detector:1] 2015-11-11 00:04:10,638  
> Ref.java:413 - Stackoverflow [private java.lang.Runnable 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier.runOnClose,
>  final java.lang.Runnable 
> org.apache.cassandra.io.sstable.format.SSTableReader$DropPageCache.andThen, 
> final org.apache.cassandra.cache.InstrumentingCache 
> org.apache.cassandra.io.sstable.SSTableRewriter$InvalidateKeys.cache, private 
> final org.apache.cassandra.cache.ICache 
> org.apache.cassandra.cache.InstrumentingCache.map, private final 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap 
> org.apache.cassandra.cache.ConcurrentLinkedHashCache.map, final 
> com.googlecode.concurrentlinkedhashmap.LinkedDeque 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap.evictionDeque, 
> com.googlecode.concurrentlinkedhashmap.Linked 
> com.googlecode.concurrentlinkedhashmap.LinkedDeque.first, 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node 
> ....... (repeated a whole bunch more) .... 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.next, 
> final java.lang.Object 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node.key, 
> public final byte[] org.apache.cassandra.cache.KeyCacheKey.key



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to