[ 
https://issues.apache.org/jira/browse/CASSANDRA-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Schuller updated CASSANDRA-3832:
--------------------------------------

    Attachment: CASSANDRA-3832-trunk-dontwaitonfuture.txt

Attaching simple patch to just not wait on the future. Given that we have no 
special code path to handle timeouts anyway, this does not introduce any actual 
lack of failure handling beyond what is already there, so as far as I can tell 
it should not cause any failure to reach schema agreement that we would not 
already be vulnerable to.

Also upping priority since this bug causes clusters to refuse to start up even 
with full cluster re-starts by the operator.
                
> gossip stage backed up due to migration manager future de-ref 
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-3832
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3832
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>            Priority: Critical
>             Fix For: 1.1
>
>         Attachments: CASSANDRA-3832-trunk-dontwaitonfuture.txt
>
>
> This is just bootstrapping a ~ 180 trunk cluster. After a while, a
> node I was on was stuck with thinking all nodes are down, because
> gossip stage was backed up, because it was spending a long time
> (multiple seconds or more, I suppose RPC timeout maybe) doing the
> following. Cluster-wide restart -> back to normal. I have not
> investigated further.
> {code}
> "GossipStage:1" daemon prio=10 tid=0x00007f9d5847a800 nid=0xa6fc waiting on 
> condition [0x000000004345f000]
>    java.lang.Thread.State: WAITING (parking)
>       at sun.misc.Unsafe.park(Native Method)
>       - parking to wait for  <0x00000005029ad1c0> (a 
> java.util.concurrent.FutureTask$Sync)
>       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
>       at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
>       at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>       at 
> org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:364)
>       at 
> org.apache.cassandra.service.MigrationManager.rectifySchema(MigrationManager.java:132)
>       at 
> org.apache.cassandra.service.MigrationManager.onAlive(MigrationManager.java:75)
>       at org.apache.cassandra.gms.Gossiper.markAlive(Gossiper.java:802)
>       at 
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:918)
>       at 
> org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:68)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to