[jira] [Updated] (CASSANDRA-9279) Gossip (and mutations) lock up on startup
[ https://issues.apache.org/jira/browse/CASSANDRA-9279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-9279: --- Component/s: Lifecycle Coordination > Gossip (and mutations) lock up on startup > - > > Key: CASSANDRA-9279 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9279 > Project: Cassandra > Issue Type: Bug > Components: Coordination, Lifecycle >Reporter: Sebastian Estevez >Assignee: Paulo Motta > Attachments: Screen Shot 2015-04-30 at 4.41.57 PM.png > > > Cluster running 2.0.14.352 on EC2 - c3.4xl's > 2 nodes out of 8 exhibited the following behavior > When starting up the node we noticed it was gray in OpsCenter. Other > monitoring tool showed it as up. > Turned out gossip tasks were piling up and we could see the following in the > system.log: > {code} > WARN [GossipTasks:1] 2015-04-30 20:22:29,512 Gossiper.java (line 671) Gossip > stage has 4270 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2015-04-30 20:22:30,612 Gossiper.java (line 671) Gossip > stage has 4272 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2015-04-30 20:22:31,713 Gossiper.java (line 671) Gossip > stage has 4273 pending tasks; skipping status check (no nodes will be marked > down) > ... > {code} > and tpstats shows blocked tasks--gossip and mutations: > {code} > GossipStage 1 3904 29384 0 > 0 > {code} > the CPU's are inactive (See attachment) > and dstat output: > {code} > You did not select any stats, using -cdngy by default. > total-cpu-usage -dsk/total- -net/total- ---paging-- ---system-- > usr sys idl wai hiq siq| read writ| recv send| in out | int csw > 2 0 97 0 0 0|1324k 1381k| 0 0 | 0 0 |6252 5548 > 0 0 100 0 0 0| 064k| 42k 1017k| 0 0 |3075 2537 > 0 0 99 0 0 0| 0 8192B| 39k 794k| 0 0 |6999 7039 > 0 0 100 0 0 0| 0 0 | 39k 759k| 0 0 |3067 2726 > 0 0 99 0 0 0| 0 184k| 48k 1086k| 0 0 |4829 4178 > 0 0 99 0 0 0| 0 8192B| 34k 802k| 0 0 |1671 1240 > 0 0 100 0 0 0| 0 8192B| 48k 1067k| 0 0 |1878 1193 > {code} > I managed to grab a thread dump: > https://gist.githubusercontent.com/anonymous/3b7b4698c32032603493/raw/read.md > and dmesg: > https://gist.githubusercontent.com/anonymous/5982b15337c9afbd5d49/raw/f3c2e4411b9d59e90f4615d93c7c1ad25922e170/read.md > Restarting the node solved the issue (it came up normally), we don't know > what is causing it but apparently (per the thread dump) gossip threads are > blocked writing the system keyspace and the writes waiting on the commitlog. > Gossip: > {code} > "GossipStage:1" daemon prio=10 tid=0x7ffa23471800 nid=0xa13fa waiting on > condition [0x7ff9cbe26000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0005d3f50960> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) > at > java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:351) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:336) > at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:211) > at > org.apache.cassandra.cql3.statements.ModificationStatement.executeInternal(ModificationStatement.java:709) > at > org.apache.cassandra.cql3.QueryProcessor.processInternal(QueryProcessor.java:208) > at > org.apache.cassandra.db.SystemKeyspace.updatePeerInfo(SystemKeyspace.java:379) > - locked <0x0005d3f41ed8> (a java.lang.Class for > org.apache.cassandra.db.SystemKeyspace) > at > org.apache.cassandra.service.StorageService.updatePeerInfo(StorageService.java:1414) > at > org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1524) > at > org.apache.cassandra.service.StorageService.onChange(StorageService.java:1350) > at > org.apache.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1083) > at org.apache.cassandra.gms.Gossiper.applyNewStates(Gossiper.java:1065) >
[jira] [Updated] (CASSANDRA-9279) Gossip (and mutations) lock up on startup
[ https://issues.apache.org/jira/browse/CASSANDRA-9279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-9279: - Fix Version/s: (was: 2.1.x) Gossip (and mutations) lock up on startup - Key: CASSANDRA-9279 URL: https://issues.apache.org/jira/browse/CASSANDRA-9279 Project: Cassandra Issue Type: Bug Reporter: Sebastian Estevez Assignee: Paulo Motta Attachments: Screen Shot 2015-04-30 at 4.41.57 PM.png Cluster running 2.0.14.352 on EC2 - c3.4xl's 2 nodes out of 8 exhibited the following behavior When starting up the node we noticed it was gray in OpsCenter. Other monitoring tool showed it as up. Turned out gossip tasks were piling up and we could see the following in the system.log: {code} WARN [GossipTasks:1] 2015-04-30 20:22:29,512 Gossiper.java (line 671) Gossip stage has 4270 pending tasks; skipping status check (no nodes will be marked down) WARN [GossipTasks:1] 2015-04-30 20:22:30,612 Gossiper.java (line 671) Gossip stage has 4272 pending tasks; skipping status check (no nodes will be marked down) WARN [GossipTasks:1] 2015-04-30 20:22:31,713 Gossiper.java (line 671) Gossip stage has 4273 pending tasks; skipping status check (no nodes will be marked down) ... {code} and tpstats shows blocked tasks--gossip and mutations: {code} GossipStage 1 3904 29384 0 0 {code} the CPU's are inactive (See attachment) and dstat output: {code} You did not select any stats, using -cdngy by default. total-cpu-usage -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send| in out | int csw 2 0 97 0 0 0|1324k 1381k| 0 0 | 0 0 |6252 5548 0 0 100 0 0 0| 064k| 42k 1017k| 0 0 |3075 2537 0 0 99 0 0 0| 0 8192B| 39k 794k| 0 0 |6999 7039 0 0 100 0 0 0| 0 0 | 39k 759k| 0 0 |3067 2726 0 0 99 0 0 0| 0 184k| 48k 1086k| 0 0 |4829 4178 0 0 99 0 0 0| 0 8192B| 34k 802k| 0 0 |1671 1240 0 0 100 0 0 0| 0 8192B| 48k 1067k| 0 0 |1878 1193 {code} I managed to grab a thread dump: https://gist.githubusercontent.com/anonymous/3b7b4698c32032603493/raw/read.md and dmesg: https://gist.githubusercontent.com/anonymous/5982b15337c9afbd5d49/raw/f3c2e4411b9d59e90f4615d93c7c1ad25922e170/read.md Restarting the node solved the issue (it came up normally), we don't know what is causing it but apparently (per the thread dump) gossip threads are blocked writing the system keyspace and the writes waiting on the commitlog. Gossip: {code} GossipStage:1 daemon prio=10 tid=0x7ffa23471800 nid=0xa13fa waiting on condition [0x7ff9cbe26000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0005d3f50960 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:351) at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:336) at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:211) at org.apache.cassandra.cql3.statements.ModificationStatement.executeInternal(ModificationStatement.java:709) at org.apache.cassandra.cql3.QueryProcessor.processInternal(QueryProcessor.java:208) at org.apache.cassandra.db.SystemKeyspace.updatePeerInfo(SystemKeyspace.java:379) - locked 0x0005d3f41ed8 (a java.lang.Class for org.apache.cassandra.db.SystemKeyspace) at org.apache.cassandra.service.StorageService.updatePeerInfo(StorageService.java:1414) at org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1524) at org.apache.cassandra.service.StorageService.onChange(StorageService.java:1350) at org.apache.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1083) at org.apache.cassandra.gms.Gossiper.applyNewStates(Gossiper.java:1065) at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1023) at
[jira] [Updated] (CASSANDRA-9279) Gossip (and mutations) lock up on startup
[ https://issues.apache.org/jira/browse/CASSANDRA-9279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9279: -- Assignee: Paulo Motta (was: Benedict) Gossip (and mutations) lock up on startup - Key: CASSANDRA-9279 URL: https://issues.apache.org/jira/browse/CASSANDRA-9279 Project: Cassandra Issue Type: Bug Reporter: Sebastian Estevez Assignee: Paulo Motta Fix For: 2.0.x Attachments: Screen Shot 2015-04-30 at 4.41.57 PM.png Cluster running 2.0.14.352 on EC2 - c3.4xl's 2 nodes out of 8 exhibited the following behavior When starting up the node we noticed it was gray in OpsCenter. Other monitoring tool showed it as up. Turned out gossip tasks were piling up and we could see the following in the system.log: {code} WARN [GossipTasks:1] 2015-04-30 20:22:29,512 Gossiper.java (line 671) Gossip stage has 4270 pending tasks; skipping status check (no nodes will be marked down) WARN [GossipTasks:1] 2015-04-30 20:22:30,612 Gossiper.java (line 671) Gossip stage has 4272 pending tasks; skipping status check (no nodes will be marked down) WARN [GossipTasks:1] 2015-04-30 20:22:31,713 Gossiper.java (line 671) Gossip stage has 4273 pending tasks; skipping status check (no nodes will be marked down) ... {code} and tpstats shows blocked tasks--gossip and mutations: {code} GossipStage 1 3904 29384 0 0 {code} the CPU's are inactive (See attachment) and dstat output: {code} You did not select any stats, using -cdngy by default. total-cpu-usage -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send| in out | int csw 2 0 97 0 0 0|1324k 1381k| 0 0 | 0 0 |6252 5548 0 0 100 0 0 0| 064k| 42k 1017k| 0 0 |3075 2537 0 0 99 0 0 0| 0 8192B| 39k 794k| 0 0 |6999 7039 0 0 100 0 0 0| 0 0 | 39k 759k| 0 0 |3067 2726 0 0 99 0 0 0| 0 184k| 48k 1086k| 0 0 |4829 4178 0 0 99 0 0 0| 0 8192B| 34k 802k| 0 0 |1671 1240 0 0 100 0 0 0| 0 8192B| 48k 1067k| 0 0 |1878 1193 {code} I managed to grab a thread dump: https://gist.githubusercontent.com/anonymous/3b7b4698c32032603493/raw/read.md and dmesg: https://gist.githubusercontent.com/anonymous/5982b15337c9afbd5d49/raw/f3c2e4411b9d59e90f4615d93c7c1ad25922e170/read.md Restarting the node solved the issue (it came up normally), we don't know what is causing it but apparently (per the thread dump) gossip threads are blocked writing the system keyspace and the writes waiting on the commitlog. Gossip: {code} GossipStage:1 daemon prio=10 tid=0x7ffa23471800 nid=0xa13fa waiting on condition [0x7ff9cbe26000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0005d3f50960 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:351) at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:336) at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:211) at org.apache.cassandra.cql3.statements.ModificationStatement.executeInternal(ModificationStatement.java:709) at org.apache.cassandra.cql3.QueryProcessor.processInternal(QueryProcessor.java:208) at org.apache.cassandra.db.SystemKeyspace.updatePeerInfo(SystemKeyspace.java:379) - locked 0x0005d3f41ed8 (a java.lang.Class for org.apache.cassandra.db.SystemKeyspace) at org.apache.cassandra.service.StorageService.updatePeerInfo(StorageService.java:1414) at org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1524) at org.apache.cassandra.service.StorageService.onChange(StorageService.java:1350) at org.apache.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1083) at org.apache.cassandra.gms.Gossiper.applyNewStates(Gossiper.java:1065) at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1023) at
[jira] [Updated] (CASSANDRA-9279) Gossip (and mutations) lock up on startup
[ https://issues.apache.org/jira/browse/CASSANDRA-9279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9279: -- Fix Version/s: (was: 2.0.x) 2.1.x Gossip (and mutations) lock up on startup - Key: CASSANDRA-9279 URL: https://issues.apache.org/jira/browse/CASSANDRA-9279 Project: Cassandra Issue Type: Bug Reporter: Sebastian Estevez Assignee: Paulo Motta Fix For: 2.1.x Attachments: Screen Shot 2015-04-30 at 4.41.57 PM.png Cluster running 2.0.14.352 on EC2 - c3.4xl's 2 nodes out of 8 exhibited the following behavior When starting up the node we noticed it was gray in OpsCenter. Other monitoring tool showed it as up. Turned out gossip tasks were piling up and we could see the following in the system.log: {code} WARN [GossipTasks:1] 2015-04-30 20:22:29,512 Gossiper.java (line 671) Gossip stage has 4270 pending tasks; skipping status check (no nodes will be marked down) WARN [GossipTasks:1] 2015-04-30 20:22:30,612 Gossiper.java (line 671) Gossip stage has 4272 pending tasks; skipping status check (no nodes will be marked down) WARN [GossipTasks:1] 2015-04-30 20:22:31,713 Gossiper.java (line 671) Gossip stage has 4273 pending tasks; skipping status check (no nodes will be marked down) ... {code} and tpstats shows blocked tasks--gossip and mutations: {code} GossipStage 1 3904 29384 0 0 {code} the CPU's are inactive (See attachment) and dstat output: {code} You did not select any stats, using -cdngy by default. total-cpu-usage -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send| in out | int csw 2 0 97 0 0 0|1324k 1381k| 0 0 | 0 0 |6252 5548 0 0 100 0 0 0| 064k| 42k 1017k| 0 0 |3075 2537 0 0 99 0 0 0| 0 8192B| 39k 794k| 0 0 |6999 7039 0 0 100 0 0 0| 0 0 | 39k 759k| 0 0 |3067 2726 0 0 99 0 0 0| 0 184k| 48k 1086k| 0 0 |4829 4178 0 0 99 0 0 0| 0 8192B| 34k 802k| 0 0 |1671 1240 0 0 100 0 0 0| 0 8192B| 48k 1067k| 0 0 |1878 1193 {code} I managed to grab a thread dump: https://gist.githubusercontent.com/anonymous/3b7b4698c32032603493/raw/read.md and dmesg: https://gist.githubusercontent.com/anonymous/5982b15337c9afbd5d49/raw/f3c2e4411b9d59e90f4615d93c7c1ad25922e170/read.md Restarting the node solved the issue (it came up normally), we don't know what is causing it but apparently (per the thread dump) gossip threads are blocked writing the system keyspace and the writes waiting on the commitlog. Gossip: {code} GossipStage:1 daemon prio=10 tid=0x7ffa23471800 nid=0xa13fa waiting on condition [0x7ff9cbe26000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0005d3f50960 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:351) at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:336) at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:211) at org.apache.cassandra.cql3.statements.ModificationStatement.executeInternal(ModificationStatement.java:709) at org.apache.cassandra.cql3.QueryProcessor.processInternal(QueryProcessor.java:208) at org.apache.cassandra.db.SystemKeyspace.updatePeerInfo(SystemKeyspace.java:379) - locked 0x0005d3f41ed8 (a java.lang.Class for org.apache.cassandra.db.SystemKeyspace) at org.apache.cassandra.service.StorageService.updatePeerInfo(StorageService.java:1414) at org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1524) at org.apache.cassandra.service.StorageService.onChange(StorageService.java:1350) at org.apache.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1083) at org.apache.cassandra.gms.Gossiper.applyNewStates(Gossiper.java:1065) at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1023) at
[jira] [Updated] (CASSANDRA-9279) Gossip (and mutations) lock up on startup
[ https://issues.apache.org/jira/browse/CASSANDRA-9279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Thompson updated CASSANDRA-9279: --- Fix Version/s: 2.0.x Gossip (and mutations) lock up on startup - Key: CASSANDRA-9279 URL: https://issues.apache.org/jira/browse/CASSANDRA-9279 Project: Cassandra Issue Type: Bug Reporter: Sebastian Estevez Assignee: Brandon Williams Fix For: 2.0.x Attachments: Screen Shot 2015-04-30 at 4.41.57 PM.png Cluster running 2.0.14.352 on EC2 - c3.4xl's 2 nodes out of 8 exhibited the following behavior When starting up the node we noticed it was gray in OpsCenter. Other monitoring tool showed it as up. Turned out gossip tasks were piling up and we could see the following in the system.log: {code} WARN [GossipTasks:1] 2015-04-30 20:22:29,512 Gossiper.java (line 671) Gossip stage has 4270 pending tasks; skipping status check (no nodes will be marked down) WARN [GossipTasks:1] 2015-04-30 20:22:30,612 Gossiper.java (line 671) Gossip stage has 4272 pending tasks; skipping status check (no nodes will be marked down) WARN [GossipTasks:1] 2015-04-30 20:22:31,713 Gossiper.java (line 671) Gossip stage has 4273 pending tasks; skipping status check (no nodes will be marked down) ... {code} and tpstats shows blocked tasks--gossip and mutations: {code} GossipStage 1 3904 29384 0 0 {code} the CPU's are inactive (See attachment) and dstat output: {code} You did not select any stats, using -cdngy by default. total-cpu-usage -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send| in out | int csw 2 0 97 0 0 0|1324k 1381k| 0 0 | 0 0 |6252 5548 0 0 100 0 0 0| 064k| 42k 1017k| 0 0 |3075 2537 0 0 99 0 0 0| 0 8192B| 39k 794k| 0 0 |6999 7039 0 0 100 0 0 0| 0 0 | 39k 759k| 0 0 |3067 2726 0 0 99 0 0 0| 0 184k| 48k 1086k| 0 0 |4829 4178 0 0 99 0 0 0| 0 8192B| 34k 802k| 0 0 |1671 1240 0 0 100 0 0 0| 0 8192B| 48k 1067k| 0 0 |1878 1193 {code} I managed to grab a thread dump: https://gist.githubusercontent.com/anonymous/3b7b4698c32032603493/raw/read.md and dmesg: https://gist.githubusercontent.com/anonymous/5982b15337c9afbd5d49/raw/f3c2e4411b9d59e90f4615d93c7c1ad25922e170/read.md Restarting the node solved the issue (it came up normally), we don't know what is causing it but apparently (per the thread dump) gossip threads are blocked writing the system keyspace and the writes waiting on the commitlog. Gossip: {code} GossipStage:1 daemon prio=10 tid=0x7ffa23471800 nid=0xa13fa waiting on condition [0x7ff9cbe26000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0005d3f50960 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:351) at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:336) at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:211) at org.apache.cassandra.cql3.statements.ModificationStatement.executeInternal(ModificationStatement.java:709) at org.apache.cassandra.cql3.QueryProcessor.processInternal(QueryProcessor.java:208) at org.apache.cassandra.db.SystemKeyspace.updatePeerInfo(SystemKeyspace.java:379) - locked 0x0005d3f41ed8 (a java.lang.Class for org.apache.cassandra.db.SystemKeyspace) at org.apache.cassandra.service.StorageService.updatePeerInfo(StorageService.java:1414) at org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1524) at org.apache.cassandra.service.StorageService.onChange(StorageService.java:1350) at org.apache.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1083) at org.apache.cassandra.gms.Gossiper.applyNewStates(Gossiper.java:1065) at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1023) at
[jira] [Updated] (CASSANDRA-9279) Gossip (and mutations) lock up on startup
[ https://issues.apache.org/jira/browse/CASSANDRA-9279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Thompson updated CASSANDRA-9279: --- Assignee: Brandon Williams Gossip (and mutations) lock up on startup - Key: CASSANDRA-9279 URL: https://issues.apache.org/jira/browse/CASSANDRA-9279 Project: Cassandra Issue Type: Bug Reporter: Sebastian Estevez Assignee: Brandon Williams Fix For: 2.0.x Attachments: Screen Shot 2015-04-30 at 4.41.57 PM.png Cluster running 2.0.14.352 on EC2 - c3.4xl's 2 nodes out of 8 exhibited the following behavior When starting up the node we noticed it was gray in OpsCenter. Other monitoring tool showed it as up. Turned out gossip tasks were piling up and we could see the following in the system.log: {code} WARN [GossipTasks:1] 2015-04-30 20:22:29,512 Gossiper.java (line 671) Gossip stage has 4270 pending tasks; skipping status check (no nodes will be marked down) WARN [GossipTasks:1] 2015-04-30 20:22:30,612 Gossiper.java (line 671) Gossip stage has 4272 pending tasks; skipping status check (no nodes will be marked down) WARN [GossipTasks:1] 2015-04-30 20:22:31,713 Gossiper.java (line 671) Gossip stage has 4273 pending tasks; skipping status check (no nodes will be marked down) ... {code} and tpstats shows blocked tasks--gossip and mutations: {code} GossipStage 1 3904 29384 0 0 {code} the CPU's are inactive (See attachment) and dstat output: {code} You did not select any stats, using -cdngy by default. total-cpu-usage -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send| in out | int csw 2 0 97 0 0 0|1324k 1381k| 0 0 | 0 0 |6252 5548 0 0 100 0 0 0| 064k| 42k 1017k| 0 0 |3075 2537 0 0 99 0 0 0| 0 8192B| 39k 794k| 0 0 |6999 7039 0 0 100 0 0 0| 0 0 | 39k 759k| 0 0 |3067 2726 0 0 99 0 0 0| 0 184k| 48k 1086k| 0 0 |4829 4178 0 0 99 0 0 0| 0 8192B| 34k 802k| 0 0 |1671 1240 0 0 100 0 0 0| 0 8192B| 48k 1067k| 0 0 |1878 1193 {code} I managed to grab a thread dump: https://gist.githubusercontent.com/anonymous/3b7b4698c32032603493/raw/read.md and dmesg: https://gist.githubusercontent.com/anonymous/5982b15337c9afbd5d49/raw/f3c2e4411b9d59e90f4615d93c7c1ad25922e170/read.md Restarting the node solved the issue (it came up normally), we don't know what is causing it but apparently (per the thread dump) gossip threads are blocked writing the system keyspace and the writes waiting on the commitlog. Gossip: {code} GossipStage:1 daemon prio=10 tid=0x7ffa23471800 nid=0xa13fa waiting on condition [0x7ff9cbe26000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0005d3f50960 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:351) at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:336) at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:211) at org.apache.cassandra.cql3.statements.ModificationStatement.executeInternal(ModificationStatement.java:709) at org.apache.cassandra.cql3.QueryProcessor.processInternal(QueryProcessor.java:208) at org.apache.cassandra.db.SystemKeyspace.updatePeerInfo(SystemKeyspace.java:379) - locked 0x0005d3f41ed8 (a java.lang.Class for org.apache.cassandra.db.SystemKeyspace) at org.apache.cassandra.service.StorageService.updatePeerInfo(StorageService.java:1414) at org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1524) at org.apache.cassandra.service.StorageService.onChange(StorageService.java:1350) at org.apache.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1083) at org.apache.cassandra.gms.Gossiper.applyNewStates(Gossiper.java:1065) at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1023) at
[jira] [Updated] (CASSANDRA-9279) Gossip (and mutations) lock up on startup
[ https://issues.apache.org/jira/browse/CASSANDRA-9279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-9279: Assignee: (was: Brandon Williams) Gossip (and mutations) lock up on startup - Key: CASSANDRA-9279 URL: https://issues.apache.org/jira/browse/CASSANDRA-9279 Project: Cassandra Issue Type: Bug Reporter: Sebastian Estevez Fix For: 2.0.x Attachments: Screen Shot 2015-04-30 at 4.41.57 PM.png Cluster running 2.0.14.352 on EC2 - c3.4xl's 2 nodes out of 8 exhibited the following behavior When starting up the node we noticed it was gray in OpsCenter. Other monitoring tool showed it as up. Turned out gossip tasks were piling up and we could see the following in the system.log: {code} WARN [GossipTasks:1] 2015-04-30 20:22:29,512 Gossiper.java (line 671) Gossip stage has 4270 pending tasks; skipping status check (no nodes will be marked down) WARN [GossipTasks:1] 2015-04-30 20:22:30,612 Gossiper.java (line 671) Gossip stage has 4272 pending tasks; skipping status check (no nodes will be marked down) WARN [GossipTasks:1] 2015-04-30 20:22:31,713 Gossiper.java (line 671) Gossip stage has 4273 pending tasks; skipping status check (no nodes will be marked down) ... {code} and tpstats shows blocked tasks--gossip and mutations: {code} GossipStage 1 3904 29384 0 0 {code} the CPU's are inactive (See attachment) and dstat output: {code} You did not select any stats, using -cdngy by default. total-cpu-usage -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send| in out | int csw 2 0 97 0 0 0|1324k 1381k| 0 0 | 0 0 |6252 5548 0 0 100 0 0 0| 064k| 42k 1017k| 0 0 |3075 2537 0 0 99 0 0 0| 0 8192B| 39k 794k| 0 0 |6999 7039 0 0 100 0 0 0| 0 0 | 39k 759k| 0 0 |3067 2726 0 0 99 0 0 0| 0 184k| 48k 1086k| 0 0 |4829 4178 0 0 99 0 0 0| 0 8192B| 34k 802k| 0 0 |1671 1240 0 0 100 0 0 0| 0 8192B| 48k 1067k| 0 0 |1878 1193 {code} I managed to grab a thread dump: https://gist.githubusercontent.com/anonymous/3b7b4698c32032603493/raw/read.md and dmesg: https://gist.githubusercontent.com/anonymous/5982b15337c9afbd5d49/raw/f3c2e4411b9d59e90f4615d93c7c1ad25922e170/read.md Restarting the node solved the issue (it came up normally), we don't know what is causing it but apparently (per the thread dump) gossip threads are blocked writing the system keyspace and the writes waiting on the commitlog. Gossip: {code} GossipStage:1 daemon prio=10 tid=0x7ffa23471800 nid=0xa13fa waiting on condition [0x7ff9cbe26000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0005d3f50960 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:351) at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:336) at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:211) at org.apache.cassandra.cql3.statements.ModificationStatement.executeInternal(ModificationStatement.java:709) at org.apache.cassandra.cql3.QueryProcessor.processInternal(QueryProcessor.java:208) at org.apache.cassandra.db.SystemKeyspace.updatePeerInfo(SystemKeyspace.java:379) - locked 0x0005d3f41ed8 (a java.lang.Class for org.apache.cassandra.db.SystemKeyspace) at org.apache.cassandra.service.StorageService.updatePeerInfo(StorageService.java:1414) at org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1524) at org.apache.cassandra.service.StorageService.onChange(StorageService.java:1350) at org.apache.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1083) at org.apache.cassandra.gms.Gossiper.applyNewStates(Gossiper.java:1065) at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1023) at
[jira] [Updated] (CASSANDRA-9279) Gossip (and mutations) lock up on startup
[ https://issues.apache.org/jira/browse/CASSANDRA-9279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Estevez updated CASSANDRA-9279: - Summary: Gossip (and mutations) lock up on startup (was: Gossip locks up on startup) Gossip (and mutations) lock up on startup - Key: CASSANDRA-9279 URL: https://issues.apache.org/jira/browse/CASSANDRA-9279 Project: Cassandra Issue Type: Bug Reporter: Sebastian Estevez Attachments: Screen Shot 2015-04-30 at 4.41.57 PM.png Cluster running 2.0.14.352 on EC2 - c3.4xl's 2 nodes out of 8 exhibited the following behavior When starting up the node we noticed it was gray in OpsCenter. Other monitoring tool showed it as up. Turned out gossip tasks were piling up and we could see the following in the system.log: {code} WARN [GossipTasks:1] 2015-04-30 20:22:29,512 Gossiper.java (line 671) Gossip stage has 4270 pending tasks; skipping status check (no nodes will be marked down) WARN [GossipTasks:1] 2015-04-30 20:22:30,612 Gossiper.java (line 671) Gossip stage has 4272 pending tasks; skipping status check (no nodes will be marked down) WARN [GossipTasks:1] 2015-04-30 20:22:31,713 Gossiper.java (line 671) Gossip stage has 4273 pending tasks; skipping status check (no nodes will be marked down) ... {code} and tpstats shows blocked tasks--gossip and mutations: {code} GossipStage 1 3904 29384 0 0 {code} the CPU's are inactive (See attachment) and dstat output: {code} You did not select any stats, using -cdngy by default. total-cpu-usage -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send| in out | int csw 2 0 97 0 0 0|1324k 1381k| 0 0 | 0 0 |6252 5548 0 0 100 0 0 0| 064k| 42k 1017k| 0 0 |3075 2537 0 0 99 0 0 0| 0 8192B| 39k 794k| 0 0 |6999 7039 0 0 100 0 0 0| 0 0 | 39k 759k| 0 0 |3067 2726 0 0 99 0 0 0| 0 184k| 48k 1086k| 0 0 |4829 4178 0 0 99 0 0 0| 0 8192B| 34k 802k| 0 0 |1671 1240 0 0 100 0 0 0| 0 8192B| 48k 1067k| 0 0 |1878 1193 {code} I managed to grab a thread dump: https://gist.githubusercontent.com/anonymous/3b7b4698c32032603493/raw/read.md and dmesg: https://gist.githubusercontent.com/anonymous/5982b15337c9afbd5d49/raw/f3c2e4411b9d59e90f4615d93c7c1ad25922e170/read.md Restarting the node solved the issue (it came up normally), we don't know what is causing it but apparently (per the thread dump) gossip threads are blocked writing the system keyspace and the writes waiting on the commitlog. Gossip: {code} GossipStage:1 daemon prio=10 tid=0x7ffa23471800 nid=0xa13fa waiting on condition [0x7ff9cbe26000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0005d3f50960 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:351) at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:336) at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:211) at org.apache.cassandra.cql3.statements.ModificationStatement.executeInternal(ModificationStatement.java:709) at org.apache.cassandra.cql3.QueryProcessor.processInternal(QueryProcessor.java:208) at org.apache.cassandra.db.SystemKeyspace.updatePeerInfo(SystemKeyspace.java:379) - locked 0x0005d3f41ed8 (a java.lang.Class for org.apache.cassandra.db.SystemKeyspace) at org.apache.cassandra.service.StorageService.updatePeerInfo(StorageService.java:1414) at org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1524) at org.apache.cassandra.service.StorageService.onChange(StorageService.java:1350) at org.apache.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1083) at org.apache.cassandra.gms.Gossiper.applyNewStates(Gossiper.java:1065) at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1023) at