[jira] [Commented] (CASSANDRA-12281) Gossip blocks on startup when another node is bootstrapping

2016-10-31 Thread Andy Peckys (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15622189#comment-15622189
 ] 

Andy Peckys commented on CASSANDRA-12281:
-

It's the same for me, only 1 keyspace

> Gossip blocks on startup when another node is bootstrapping
> ---
>
> Key: CASSANDRA-12281
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12281
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Eric Evans
>Assignee: Stefan Podkowinski
> Attachments: restbase1015-a_jstack.txt
>
>
> In our cluster, normal node startup times (after a drain on shutdown) are 
> less than 1 minute.  However, when another node in the cluster is 
> bootstrapping, the same node startup takes nearly 30 minutes to complete, the 
> apparent result of gossip blocking on pending range calculations.
> {noformat}
> $ nodetool-a tpstats
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 0   1840 0
>  0
> ReadStage 0 0   2350 0
>  0
> RequestResponseStage  0 0 53 0
>  0
> ReadRepairStage   0 0  1 0
>  0
> CounterMutationStage  0 0  0 0
>  0
> HintedHandoff 0 0 44 0
>  0
> MiscStage 0 0  0 0
>  0
> CompactionExecutor3 3395 0
>  0
> MemtableReclaimMemory 0 0 30 0
>  0
> PendingRangeCalculator1 2 29 0
>  0
> GossipStage   1  5602164 0
>  0
> MigrationStage0 0  0 0
>  0
> MemtablePostFlush 0 0111 0
>  0
> ValidationExecutor0 0  0 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   0 0 30 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> Message type   Dropped
> READ 0
> RANGE_SLICE  0
> _TRACE   0
> MUTATION 0
> COUNTER_MUTATION 0
> REQUEST_RESPONSE 0
> PAGED_RANGE  0
> READ_REPAIR  0
> {noformat}
> A full thread dump is attached, but the relevant bit seems to be here:
> {noformat}
> [ ... ]
> "GossipStage:1" #1801 daemon prio=5 os_prio=0 tid=0x7fe4cd54b000 
> nid=0xea9 waiting on condition [0x7fddcf883000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0004c1e922c0> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>   at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
>   at 
> org.apache.cassandra.locator.TokenMetadata.updateNormalTokens(TokenMetadata.java:174)
>   at 
> org.apache.cassandra.locator.TokenMetadata.updateNormalTokens(TokenMetadata.java:160)
>   at 
> org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:2023)
>   at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1682)
>   at 
> org.apache.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1182)
>   at org.apache.cassandra.gms.Gossiper.applyNewStates(Gossiper.java:1165)
>   at 
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1128)
>   at 
> org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(Gossip

[jira] [Commented] (CASSANDRA-12281) Gossip blocks on startup when another node is bootstrapping

2016-10-19 Thread Andy Peckys (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15589212#comment-15589212
 ] 

Andy Peckys commented on CASSANDRA-12281:
-

I just experienced that, had the PendingRangeCalculator block gossip and it 
backed up gossip on all nodes in the cluster. Out of 300+ nodes i had to stop 
and start gossip on over a 3rd of nodes to get them back into sync.

> Gossip blocks on startup when another node is bootstrapping
> ---
>
> Key: CASSANDRA-12281
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12281
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Eric Evans
>Assignee: Joel Knighton
>Priority: Minor
> Attachments: restbase1015-a_jstack.txt
>
>
> In our cluster, normal node startup times (after a drain on shutdown) are 
> less than 1 minute.  However, when another node in the cluster is 
> bootstrapping, the same node startup takes nearly 30 minutes to complete, the 
> apparent result of gossip blocking on pending range calculations.
> {noformat}
> $ nodetool-a tpstats
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 0   1840 0
>  0
> ReadStage 0 0   2350 0
>  0
> RequestResponseStage  0 0 53 0
>  0
> ReadRepairStage   0 0  1 0
>  0
> CounterMutationStage  0 0  0 0
>  0
> HintedHandoff 0 0 44 0
>  0
> MiscStage 0 0  0 0
>  0
> CompactionExecutor3 3395 0
>  0
> MemtableReclaimMemory 0 0 30 0
>  0
> PendingRangeCalculator1 2 29 0
>  0
> GossipStage   1  5602164 0
>  0
> MigrationStage0 0  0 0
>  0
> MemtablePostFlush 0 0111 0
>  0
> ValidationExecutor0 0  0 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   0 0 30 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> Message type   Dropped
> READ 0
> RANGE_SLICE  0
> _TRACE   0
> MUTATION 0
> COUNTER_MUTATION 0
> REQUEST_RESPONSE 0
> PAGED_RANGE  0
> READ_REPAIR  0
> {noformat}
> A full thread dump is attached, but the relevant bit seems to be here:
> {noformat}
> [ ... ]
> "GossipStage:1" #1801 daemon prio=5 os_prio=0 tid=0x7fe4cd54b000 
> nid=0xea9 waiting on condition [0x7fddcf883000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0004c1e922c0> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>   at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
>   at 
> org.apache.cassandra.locator.TokenMetadata.updateNormalTokens(TokenMetadata.java:174)
>   at 
> org.apache.cassandra.locator.TokenMetadata.updateNormalTokens(TokenMetadata.java:160)
>   at 
> org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:2023)
>   at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1682)
>   at 
> org.apache.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1182)
>   at org.apache.cassandra.gms

[jira] [Commented] (CASSANDRA-12281) Gossip blocks on startup when another node is bootstrapping

2016-10-18 Thread Andy Peckys (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587266#comment-15587266
 ] 

Andy Peckys commented on CASSANDRA-12281:
-

I have 300+ node cluster in AWS that is also seeing this issue

> Gossip blocks on startup when another node is bootstrapping
> ---
>
> Key: CASSANDRA-12281
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12281
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Eric Evans
>Assignee: Joel Knighton
>Priority: Minor
> Attachments: restbase1015-a_jstack.txt
>
>
> In our cluster, normal node startup times (after a drain on shutdown) are 
> less than 1 minute.  However, when another node in the cluster is 
> bootstrapping, the same node startup takes nearly 30 minutes to complete, the 
> apparent result of gossip blocking on pending range calculations.
> {noformat}
> $ nodetool-a tpstats
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 0   1840 0
>  0
> ReadStage 0 0   2350 0
>  0
> RequestResponseStage  0 0 53 0
>  0
> ReadRepairStage   0 0  1 0
>  0
> CounterMutationStage  0 0  0 0
>  0
> HintedHandoff 0 0 44 0
>  0
> MiscStage 0 0  0 0
>  0
> CompactionExecutor3 3395 0
>  0
> MemtableReclaimMemory 0 0 30 0
>  0
> PendingRangeCalculator1 2 29 0
>  0
> GossipStage   1  5602164 0
>  0
> MigrationStage0 0  0 0
>  0
> MemtablePostFlush 0 0111 0
>  0
> ValidationExecutor0 0  0 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   0 0 30 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> Message type   Dropped
> READ 0
> RANGE_SLICE  0
> _TRACE   0
> MUTATION 0
> COUNTER_MUTATION 0
> REQUEST_RESPONSE 0
> PAGED_RANGE  0
> READ_REPAIR  0
> {noformat}
> A full thread dump is attached, but the relevant bit seems to be here:
> {noformat}
> [ ... ]
> "GossipStage:1" #1801 daemon prio=5 os_prio=0 tid=0x7fe4cd54b000 
> nid=0xea9 waiting on condition [0x7fddcf883000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0004c1e922c0> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>   at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
>   at 
> org.apache.cassandra.locator.TokenMetadata.updateNormalTokens(TokenMetadata.java:174)
>   at 
> org.apache.cassandra.locator.TokenMetadata.updateNormalTokens(TokenMetadata.java:160)
>   at 
> org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:2023)
>   at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1682)
>   at 
> org.apache.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1182)
>   at org.apache.cassandra.gms.Gossiper.applyNewStates(Gossiper.java:1165)
>   at 
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1128)
>   at 
> org.apache.cassa