[ 
https://issues.apache.org/jira/browse/CASSANDRA-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14003401#comment-14003401
 ] 

Joshua McKenzie commented on CASSANDRA-7262:
--------------------------------------------

closeSession is called from multiple places that aren't synchronized (while one 
is).  Another approach to preventing this would be to change all possible entry 
points to being synchronized (convict, onError) but that's a higher level of 
granularity to take care of the potential race between various completion / 
error possibilities.

Having said that, it looks like there's other issues in play here as the 
exception posted in the ticket is only the 1st and only occurs once.  From the 
log attached:
{code:title=other exceptions}
ERROR [CompactionExecutor:382] 2014-05-17 01:17:53,157 CassandraDaemon.java 
(line 198) Exception in thread Thread[CompactionExecutor:382,1,main]
java.lang.AssertionError: Reference counter -1 for 
/mnt/ssd1/cassandra/data/ldn_production/historical_accounts/ldn_production-historical_accounts-jb-83888-Data.db
   at 
org.apache.cassandra.io.sstable.SSTableReader.releaseReference(SSTableReader.java:1107)
   at 
org.apache.cassandra.io.sstable.SSTableReader.releaseReferences(SSTableReader.java:1429)
   at 
org.apache.cassandra.db.compaction.CompactionController.close(CompactionController.java:207)
   at 
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:220)
   at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
   at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
   at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
   at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
   at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:197)
   ...
{code}
While the reference issue on closeSession() chain could lead us into a 
situation where other releases fail to call tidy() etc, given that the 
CompactionTask operation leads to the 1st violation in reference count would 
imply to me that we have 2 separate issues on this ticket.  I haven't 
reproduced the compaction reference error nor have I looked deeply into that 
yet.

> During streaming: java.lang.AssertionError: Reference counter -1
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-7262
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7262
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Cassandra 2.07, x86-64 Ubuntu 12.04.4, Oracle java 
> 1.7.0_45
>            Reporter: Duncan Sands
>            Assignee: Joshua McKenzie
>            Priority: Minor
>             Fix For: 2.0.8, 2.1 rc1
>
>         Attachments: 7262_v1.txt, system.log.gz
>
>
> Got this assertion failure this weekend during repair:
> ERROR [STREAM-IN-/192.168.21.14] 2014-05-17 01:17:52,332 StreamSession.java 
> (line 420) [Stream #3a3ac8a2-dd50-11e3-b3c1-6bf6dccd6457] Streaming error 
> occurred
> java.lang.RuntimeException: Outgoing stream handler has been closed
>         at 
> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:170)
>         at 
> org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:483)
>         at 
> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:372)
>         at 
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:289)
>         at java.lang.Thread.run(Thread.java:744)
> ERROR [STREAM-IN-/192.168.21.14] 2014-05-17 01:17:52,350 CassandraDaemon.java 
> (line 198) Exception in thread Thread[STREAM-IN-/192.168.21.14,5,RMI Runtime]
> java.lang.AssertionError: Reference counter -1 for 
> /mnt/ssd1/cassandra/data/ldn_production/historical_accounts/ldn_production-historical_accounts-jb-79827-Data.db
>         at 
> org.apache.cassandra.io.sstable.SSTableReader.releaseReference(SSTableReader.java:1107)
>         at 
> org.apache.cassandra.streaming.StreamTransferTask.abort(StreamTransferTask.java:80)
>         at 
> org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:322)
>         at 
> org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:425)
>         at 
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:300)
>         at java.lang.Thread.run(Thread.java:744)
> followed by a few more (the reference counter got down to -3).  Got the same 
> kind of assertion failure on one other node (in a different data centre; 
> there are 21 nodes altogether distributed over 4 data centres).
> I've attached the relevant part of the log.  It starts quite a bit before the 
> assertion failure at the first exception on this node ("Cannot proceed on 
> repair because a neighbor ... is dead"), and finishes a few hours afterwards 
> when the node was restarted.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to