[ 
https://issues.apache.org/jira/browse/CASSANDRA-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200962#comment-13200962
 ] 

Peter Schuller commented on CASSANDRA-3832:
-------------------------------------------

A concern here though is that in order to make this scale on very large 
clusters, you probably want to limit the amount of schema migration attempts 
that are in progress for a given schema version, and not just limit the amount 
of outstanding for a single node.

But doing that requires complicating the code so that we don't fail to migrate 
to a new schema just because one node happened to go down just after we were 
notified of 500 nodes being e.g. alive and having a schema we don't recognize.

For now, I will proceed with avoiding duplicate (endpoint, schema) pairs rather 
than global throttling.

                
> gossip stage backed up due to migration manager future de-ref 
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-3832
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3832
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>            Priority: Blocker
>             Fix For: 1.1
>
>         Attachments: CASSANDRA-3832-trunk-dontwaitonfuture.txt
>
>
> This is just bootstrapping a ~ 180 trunk cluster. After a while, a
> node I was on was stuck with thinking all nodes are down, because
> gossip stage was backed up, because it was spending a long time
> (multiple seconds or more, I suppose RPC timeout maybe) doing the
> following. Cluster-wide restart -> back to normal. I have not
> investigated further.
> {code}
> "GossipStage:1" daemon prio=10 tid=0x00007f9d5847a800 nid=0xa6fc waiting on 
> condition [0x000000004345f000]
>    java.lang.Thread.State: WAITING (parking)
>       at sun.misc.Unsafe.park(Native Method)
>       - parking to wait for  <0x00000005029ad1c0> (a 
> java.util.concurrent.FutureTask$Sync)
>       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
>       at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
>       at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>       at 
> org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:364)
>       at 
> org.apache.cassandra.service.MigrationManager.rectifySchema(MigrationManager.java:132)
>       at 
> org.apache.cassandra.service.MigrationManager.onAlive(MigrationManager.java:75)
>       at org.apache.cassandra.gms.Gossiper.markAlive(Gossiper.java:802)
>       at 
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:918)
>       at 
> org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:68)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to