[ https://issues.apache.org/jira/browse/CASSANDRA-13042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15888394#comment-15888394 ]
Greg Doermann commented on CASSANDRA-13042: ------------------------------------------- Ok, so I am also seeing this on the other nodes (this is from 1.1.1.2, the other seed node) when this is happening and the nodes are flapping: {code} WARN [MessagingService-Outgoing-/1.1.1.2] 2017-02-28 16:05:20,516 OutboundTcpConnection.java:427 - Seed gossip version is -2147483648; will not connect with that version {code} This message seems to be the root of all the evil. I was able to stabilize things (today) once I removed the 1.1.1.1 from the seed on it's own config and restarted it. Hinted handoffs passed again and everything went back to normal. Once everything calmed down I added 1.1.1.1 back as a seed node on the 1.1.1.1 cassandra.yaml and did a restart (disable gossip, disable thirft, drain, restart). When it came back up no more errors, no more problems. Things seem to be stable again. Thinking back this seems to be a problem every time a seed node goes down. If another node goes down while this is happening that other node also has massive issues until we resolve the seed gossip version error. Let me know if you need anything more from me. > The two cassandra nodes suddenly encounter hints each other and failed > replaying. > --------------------------------------------------------------------------------- > > Key: CASSANDRA-13042 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13042 > Project: Cassandra > Issue Type: Bug > Reporter: YheonHo.Choi > Priority: Critical > Attachments: out_2.2.2.1.txt, out_2.2.2.2.txt > > > Although there are no changes to cassandra, two node suddenly encounter hints > and failed replaying. > Any commands like disablethrift, disablegossip can not solve the above > problem and the only way was restart. > When we check the status of cluster, all nodes are looks UN but > describecluster show unreachable each other. > Here's the state of the cassandra during the above problem occurred. > IP addresses in report anonymized: > cassandra version: 2.2.5 > node 1 = 1.1.1.1 > node 2 = 1.1.1.2 > others = x.x.x.x > system.log > {code} > ## result of nodetool status on 1.1.1.1 > INFO [HintedHandoff:1] 2016-11-24 06:15:07,969 HintedHandOffManager.java:367 > - Started hinted handoff for host: 8caa54f3-7d67-40d6-b224-8c64a1d289be with > IP: /1.1.1.2 > INFO [HintedHandoff:1] 2016-11-24 06:15:09,969 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.2; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > INFO [HintedHandoff:2] 2016-11-24 06:25:09,736 HintedHandOffManager.java:367 > - Started hinted handoff for host: 8caa54f3-7d67-40d6-b224-8c64a1d289be with > IP: /1.1.1.2 > INFO [HintedHandoff:2] 2016-11-24 06:25:11,738 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.2; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > WARN [MemtableFlushWriter:55270] 2016-11-24 06:25:12,625 > BigTableWriter.java:184 - Writing large partition > system/hints:d640677d-f354-aa8c-be89-d2a1648c24b2 (109029803 bytes) > WARN [CompactionExecutor:37908] 2016-11-24 06:35:23,682 > BigTableWriter.java:184 - Writing large partition > system/hints:8caa54f3-7d67-40d6-b224-8c64a1d289be (250651758 bytes) > INFO [HintedHandoff:1] 2016-11-24 06:35:23,727 HintedHandOffManager.java:367 > - Started hinted handoff for host: 8caa54f3-7d67-40d6-b224-8c64a1d289be with > IP: /1.1.1.2 > INFO [HintedHandoff:1] 2016-11-24 06:35:25,728 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.2; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > WARN [CompactionExecutor:37909] 2016-11-24 06:45:53,615 > BigTableWriter.java:184 - Writing large partition > system/hints:8caa54f3-7d67-40d6-b224-8c64a1d289be (340801514 bytes) > INFO [HintedHandoff:2] 2016-11-24 06:45:53,718 HintedHandOffManager.java:367 > - Started hinted handoff for host: 8caa54f3-7d67-40d6-b224-8c64a1d289be with > IP: /1.1.1.2 > INFO [HintedHandoff:2] 2016-11-24 06:45:55,719 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.2; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > WARN [CompactionExecutor:37912] 2016-11-24 06:56:20,884 > BigTableWriter.java:184 - Writing large partition > system/hints:8caa54f3-7d67-40d6-b224-8c64a1d289be (472465093 bytes) > INFO [HintedHandoff:1] 2016-11-24 06:56:20,966 HintedHandOffManager.java:367 > - Started hinted handoff for host: 8caa54f3-7d67-40d6-b224-8c64a1d289be with > IP: /1.1.1.2 > INFO [HintedHandoff:1] 2016-11-24 06:56:22,967 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.2; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > WARN [CompactionExecutor:37911] 2016-11-24 07:07:12,568 > BigTableWriter.java:184 - Writing large partition > system/hints:8caa54f3-7d67-40d6-b224-8c64a1d289be (577392172 bytes) > INFO [HintedHandoff:2] 2016-11-24 07:07:12,643 HintedHandOffManager.java:367 > - Started hinted handoff for host: 8caa54f3-7d67-40d6-b224-8c64a1d289be with > IP: /1.1.1.2 > INFO [HintedHandoff:2] 2016-11-24 07:07:14,643 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.2; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > INFO [IndexSummaryManager:1] 2016-11-24 07:09:15,929 > IndexSummaryRedistribution.java:74 - Redistributing index summaries > ## result of nodetool status on 1.1.1.2 > INFO [HintedHandoff:1] 2016-11-24 06:11:37,300 HintedHandOffManager.java:367 > - Started hinted handoff for host: b79124e9-394c-4400-a8e7-a0c94aec6878 with > IP: /1.1.1.1 > INFO [HintedHandoff:1] 2016-11-24 06:11:39,301 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.1; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > INFO [HintedHandoff:2] 2016-11-24 06:22:17,946 HintedHandOffManager.java:367 > - Started hinted handoff for host: b79124e9-394c-4400-a8e7-a0c94aec6878 with > IP: /1.1.1.1 > INFO [HintedHandoff:2] 2016-11-24 06:22:19,948 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.1; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > INFO [IndexSummaryManager:1] 2016-11-24 06:27:00,177 > IndexSummaryRedistribution.java:74 - Redistributing index summaries > WARN [CompactionExecutor:2315] 2016-11-24 06:32:28,159 > BigTableWriter.java:184 - Writing large partition > system/hints:b79124e9-394c-4400-a8e7-a0c94aec6878 (121683824 bytes) > INFO [HintedHandoff:1] 2016-11-24 06:32:28,338 HintedHandOffManager.java:367 > - Started hinted handoff for host: b79124e9-394c-4400-a8e7-a0c94aec6878 with > IP: /1.1.1.1 > INFO [HintedHandoff:1] 2016-11-24 06:32:30,340 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.1; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > WARN [MemtableFlushWriter:8728] 2016-11-24 06:32:31,272 > BigTableWriter.java:184 - Writing large partition > system/hints:00444c39-e924-91b7-7868-ec4ac9a0e7a8 (108691834 bytes) > WARN [CompactionExecutor:2316] 2016-11-24 06:42:58,591 > BigTableWriter.java:184 - Writing large partition > system/hints:b79124e9-394c-4400-a8e7-a0c94aec6878 (327308871 bytes) > INFO [HintedHandoff:2] 2016-11-24 06:42:58,661 HintedHandOffManager.java:367 > - Started hinted handoff for host: b79124e9-394c-4400-a8e7-a0c94aec6878 with > IP: /1.1.1.1 > INFO [HintedHandoff:2] 2016-11-24 06:43:00,662 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.1; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > WARN [MemtableFlushWriter:8739] 2016-11-24 06:52:42,921 > BigTableWriter.java:184 - Writing large partition > system/hints:00444c39-e924-91b7-7868-ec4ac9a0e7a8 (105086190 bytes) > WARN [CompactionExecutor:2316] 2016-11-24 06:53:43,722 > BigTableWriter.java:184 - Writing large partition > system/hints:b79124e9-394c-4400-a8e7-a0c94aec6878 (449333141 bytes) > INFO [HintedHandoff:1] 2016-11-24 06:53:43,769 HintedHandOffManager.java:367 > - Started hinted handoff for host: b79124e9-394c-4400-a8e7-a0c94aec6878 with > IP: /1.1.1.1 > INFO [HintedHandoff:1] 2016-11-24 06:53:45,770 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.1; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > WARN [CompactionExecutor:2316] 2016-11-24 07:04:29,379 > BigTableWriter.java:184 - Writing large partition > system/hints:b79124e9-394c-4400-a8e7-a0c94aec6878 (460781059 bytes) > INFO [HintedHandoff:2] 2016-11-24 07:04:29,572 HintedHandOffManager.java:367 > - Started hinted handoff for host: b79124e9-394c-4400-a8e7-a0c94aec6878 with > IP: /1.1.1.1 > INFO [HintedHandoff:2] 2016-11-24 07:04:31,574 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.1; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > WARN [MemtableFlushWriter:8749] 2016-11-24 07:04:32,553 > BigTableWriter.java:184 - Writing large partition > system/hints:00444c39-e924-91b7-7868-ec4ac9a0e7a8 (117146652 bytes) > {code} > nodetool status > {code} > ## result of nodetool status on 1.1.1.1 > Datacenter: datacenter1 > ======================= > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns Host ID > Rack > UN 1.1.1.2 252.48 GB 256 ? > 8caa54f3-7d67-40d6-b224-8c64a1d289be rack1 > UN x.x.x.x 316.6 GB 256 ? > a08fbe58-779f-49f7-a0be-0a288d18c059 rack1 > UN x.x.x.x 240.92 GB 256 ? > d2cf2695-25b7-4aad-92ab-9a0b593df5fb rack1 > UN x.x.x.x 242.68 GB 256 ? > a01d3304-5d8d-4ac9-935f-597ebf038e70 rack1 > UN x.x.x.x 326.76 GB 256 ? > b79124e9-394c-4400-a8e7-a0c94aec6878 rack1 > UN x.x.x.x 296.61 GB 256 ? > 0ec779f9-3ee4-41a2-80b2-f27f25521964 rack1 > UN x.x.x.x 309.81 GB 256 ? > 8e6417ec-e375-40a4-bb8d-3d99148846ff rack1 > UN x.x.x.x 312.28 GB 256 ? > b39ee3e9-8f69-4f5c-8473-0e906c2301ca rack1 > UN x.x.x.x 299.9 GB 256 ? > 0ab413a9-5608-4d7b-97dc-6ed655e76dd0 rack1 > UN x.x.x.x 241.81 GB 256 ? > 81d4608f-d1df-4f52-b104-5f1c7a3b4c96 rack1 > UN x.x.x.x 228.76 GB 256 ? > afc0ae8e-1126-426f-aa5b-0fce5578531c rack1 > UN x.x.x.x 301.13 GB 256 ? > 62323ea7-8d66-46fd-93e1-ff9bbf772c17 rack1 > UN x.x.x.x 307.39 GB 256 ? > 7939c565-dc6e-43ad-b6ed-cfa3a19b20ab rack1 > UN x.x.x.x 259.21 GB 256 ? > afcdc0a9-de5a-446d-868e-ee1dabfc0107 rack1 > UN x.x.x.x 294.16 GB 256 ? > 5582b905-7136-4e87-b1a9-d450ea2eb304 rack1 > UN x.x.x.x 285.28 GB 256 ? > 6dabba2e-085d-497b-a97e-b6ebf65258d6 rack1 > UN x.x.x.x 305.59 GB 256 ? > 10b51458-800b-46f2-a80a-c2f2bbe1e46d rack1 > UN x.x.x.x 317.64 GB 256 ? > 79a69fe9-de49-49d3-b8cb-fbc59e08a821 rack1 > UN x.x.x.x 303.85 GB 256 ? > 71db8e6a-cdbb-4924-860c-34e40731719f rack1 > Note: Non-system keyspaces don't have the same replication settings, > effective ownership information is meaningless > ## result of nodetool status on 1.1.1.2 > Datacenter: datacenter1 > ======================= > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns Host ID > Rack > UN x.x.x.x 252.48 GB 256 ? > 8caa54f3-7d67-40d6-b224-8c64a1d289be rack1 > UN x.x.x.x 316.6 GB 256 ? > a08fbe58-779f-49f7-a0be-0a288d18c059 rack1 > UN x.x.x.x 240.92 GB 256 ? > d2cf2695-25b7-4aad-92ab-9a0b593df5fb rack1 > UN x.x.x.x 242.68 GB 256 ? > a01d3304-5d8d-4ac9-935f-597ebf038e70 rack1 > UN 1.1.1.1 326.76 GB 256 ? > b79124e9-394c-4400-a8e7-a0c94aec6878 rack1 > UN x.x.x.x 296.61 GB 256 ? > 0ec779f9-3ee4-41a2-80b2-f27f25521964 rack1 > UN x.x.x.x 309.81 GB 256 ? > 8e6417ec-e375-40a4-bb8d-3d99148846ff rack1 > UN x.x.x.x 312.28 GB 256 ? > b39ee3e9-8f69-4f5c-8473-0e906c2301ca rack1 > UN x.x.x.x 299.9 GB 256 ? > 0ab413a9-5608-4d7b-97dc-6ed655e76dd0 rack1 > UN x.x.x.x 241.81 GB 256 ? > 81d4608f-d1df-4f52-b104-5f1c7a3b4c96 rack1 > UN x.x.x.x 228.76 GB 256 ? > afc0ae8e-1126-426f-aa5b-0fce5578531c rack1 > UN x.x.x.x 301.13 GB 256 ? > 62323ea7-8d66-46fd-93e1-ff9bbf772c17 rack1 > UN x.x.x.x 307.39 GB 256 ? > 7939c565-dc6e-43ad-b6ed-cfa3a19b20ab rack1 > UN x.x.x.x 259.21 GB 256 ? > afcdc0a9-de5a-446d-868e-ee1dabfc0107 rack1 > UN x.x.x.x 294.16 GB 256 ? > 5582b905-7136-4e87-b1a9-d450ea2eb304 rack1 > UN x.x.x.x 285.28 GB 256 ? > 6dabba2e-085d-497b-a97e-b6ebf65258d6 rack1 > UN x.x.x.x 305.59 GB 256 ? > 10b51458-800b-46f2-a80a-c2f2bbe1e46d rack1 > UN x.x.x.x 317.64 GB 256 ? > 79a69fe9-de49-49d3-b8cb-fbc59e08a821 rack1 > UN x.x.x.x 303.85 GB 256 ? > 71db8e6a-cdbb-4924-860c-34e40731719f rack1 > Note: Non-system keyspaces don't have the same replication settings, > effective ownership information is meaningless > {code} > nodetool describecluster > {code} > ## result of nodetool describecluster on 1.1.1.1 > Cluster Information: > Name: metric > Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch > Partitioner: org.apache.cassandra.dht.Murmur3Partitioner > Schema versions: > c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9: [x.x.x.x, x.x.x.x, > x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, > x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x] > UNREACHABLE: [1.1.1.2] > ## result of nodetool describecluster on 1.1.1.2 > Cluster Information: > Name: metric > Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch > Partitioner: org.apache.cassandra.dht.Murmur3Partitioner > Schema versions: > c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9: [x.x.x.x, x.x.x.x, > x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, > x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x] > UNREACHABLE: [1.1.1.1] > {code} > We've looked at whether schema is different. > nodetool gossipinfo > {code} > ## result of nodetool gossipinfo on 1.1.1.1 > /x.x.x.x > generation:1464511828 > heartbeat:46798705 > STATUS:14:NORMAL,-1022166408914492069 > LOAD:46798599:3.40042922062E11 > SCHEMA:37196731:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:46798704:0.0 > NET_VERSION:1:9 > HOST_ID:2:a08fbe58-779f-49f7-a0be-0a288d18c059 > RPC_READY:44:true > TOKENS:13:<hidden> > /1.1.1.1 > generation:1471493280 > heartbeat:25624228 > STATUS:25498:NORMAL,-1023351769271276256 > LOAD:25624146:3.51052126949E11 > SCHEMA:16021810:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:1.1.1.1 > SEVERITY:25624230:0.0 > NET_VERSION:1:9 > HOST_ID:2:b79124e9-394c-4400-a8e7-a0c94aec6878 > RPC_READY:25524:true > TOKENS:25497:<hidden> > /x.x.x.x > generation:1467205802 > heartbeat:38628440 > STATUS:14:NORMAL,-1124164164231176551 > LOAD:38628359:3.06404316564E11 > SCHEMA:29026044:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:38628439:0.0 > NET_VERSION:1:9 > HOST_ID:2:6dabba2e-085d-497b-a97e-b6ebf65258d6 > RPC_READY:44:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1479090215 > heartbeat:2581923 > STATUS:16:NORMAL,-1034930499164644146 > LOAD:2581842:2.60571265374E11 > SCHEMA:10:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:2581922:0.0 > NET_VERSION:1:9 > HOST_ID:2:a01d3304-5d8d-4ac9-935f-597ebf038e70 > RPC_READY:44:true > TOKENS:15:<hidden> > /x.x.x.x > generation:1479089630 > heartbeat:2583693 > STATUS:16:NORMAL,-1104754836621297034 > LOAD:2583656:2.59767873976E11 > SCHEMA:10:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:2583692:0.0 > NET_VERSION:1:9 > HOST_ID:2:81d4608f-d1df-4f52-b104-5f1c7a3b4c96 > RPC_READY:44:true > TOKENS:15:<hidden> > /x.x.x.x > generation:1471482865 > heartbeat:25655714 > STATUS:28484:NORMAL,-1005894856009564957 > LOAD:25655674:3.35493584628E11 > SCHEMA:16053376:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:25655716:0.0 > NET_VERSION:1:9 > HOST_ID:2:b39ee3e9-8f69-4f5c-8473-0e906c2301ca > RPC_READY:28512:true > TOKENS:28483:<hidden> > /1.1.1.2 > generation:1479090411 > heartbeat:2581314 > STATUS:14:NORMAL,-1016248794247274800 > LOAD:2581277:2.71127001342E11 > SCHEMA:10:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:1.1.1.2 > SEVERITY:2581313:0.0 > NET_VERSION:1:9 > HOST_ID:2:8caa54f3-7d67-40d6-b224-8c64a1d289be > RPC_READY:44:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1465163903 > heartbeat:44821598 > STATUS:14:NORMAL,-1077736070944113862 > LOAD:44821597:3.26381003723E11 > SCHEMA:35219220:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:44821595:0.0 > NET_VERSION:1:9 > HOST_ID:2:71db8e6a-cdbb-4924-860c-34e40731719f > RPC_READY:42:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1467203191 > heartbeat:38636328 > STATUS:14:NORMAL,-100689147463310850 > LOAD:38636153:3.15710669743E11 > SCHEMA:29033962:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:38636327:0.0 > NET_VERSION:1:9 > HOST_ID:2:5582b905-7136-4e87-b1a9-d450ea2eb304 > RPC_READY:42:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1465206313 > heartbeat:44692819 > STATUS:16:NORMAL,-101785227522144798 > LOAD:44692800:3.18542273643E11 > SCHEMA:35090498:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:44692818:0.0 > NET_VERSION:1:9 > HOST_ID:2:0ec779f9-3ee4-41a2-80b2-f27f25521964 > RPC_READY:44:true > TOKENS:15:<hidden> > /x.x.x.x > generation:1467201538 > heartbeat:38641336 > STATUS:16:NORMAL,-1004145257859511465 > LOAD:38641252:3.22115490439E11 > SCHEMA:29038976:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:38641338:0.0 > NET_VERSION:1:9 > HOST_ID:2:0ab413a9-5608-4d7b-97dc-6ed655e76dd0 > RPC_READY:44:true > TOKENS:15:<hidden> > /x.x.x.x > generation:1467201918 > heartbeat:38640043 > STATUS:16:NORMAL,-1068518488491269501 > LOAD:38640012:3.30172166769E11 > SCHEMA:29037717:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:38640045:0.0 > NET_VERSION:1:9 > HOST_ID:2:7939c565-dc6e-43ad-b6ed-cfa3a19b20ab > RPC_READY:44:true > TOKENS:15:<hidden> > /x.x.x.x > generation:1479089944 > heartbeat:2582739 > STATUS:14:NORMAL,-109330739329126024 > LOAD:2582564:2.58856479776E11 > SCHEMA:10:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:2582738:0.0 > NET_VERSION:1:9 > HOST_ID:2:d2cf2695-25b7-4aad-92ab-9a0b593df5fb > RPC_READY:42:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1474550583 > heartbeat:16351053 > STATUS:14:NORMAL,-1012946818076902491 > LOAD:16350923:3.32929676598E11 > SCHEMA:6748586:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:16351052:0.042265426367521286 > NET_VERSION:1:9 > HOST_ID:2:8e6417ec-e375-40a4-bb8d-3d99148846ff > RPC_READY:42:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1464678970 > heartbeat:46292266 > STATUS:14:NORMAL,-100392380890457765 > LOAD:46292236:3.23397923906E11 > SCHEMA:36689950:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:46292265:0.0 > NET_VERSION:1:9 > HOST_ID:2:62323ea7-8d66-46fd-93e1-ff9bbf772c17 > RPC_READY:44:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1467202117 > heartbeat:38639452 > STATUS:16:NORMAL,-1023597752726656847 > LOAD:38639304:3.28121029924E11 > SCHEMA:29037141:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:38639451:0.0 > NET_VERSION:1:9 > HOST_ID:2:10b51458-800b-46f2-a80a-c2f2bbe1e46d > RPC_READY:44:true > TOKENS:15:<hidden> > /x.x.x.x > generation:1479087904 > heartbeat:2588919 > STATUS:14:NORMAL,-1013592683752719902 > LOAD:2588748:2.78416391781E11 > SCHEMA:10:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:2588918:0.0 > NET_VERSION:1:9 > HOST_ID:2:afcdc0a9-de5a-446d-868e-ee1dabfc0107 > RPC_READY:44:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1464513989 > heartbeat:46792633 > STATUS:14:NORMAL,-1063467021041506926 > LOAD:46792498:3.41276898418E11 > SCHEMA:37190355:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:46792632:0.0 > NET_VERSION:1:9 > HOST_ID:2:79a69fe9-de49-49d3-b8cb-fbc59e08a821 > RPC_READY:44:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1479088908 > heartbeat:2585894 > STATUS:14:NORMAL,-1060190437871147906 > LOAD:2585849:2.45716342224E11 > SCHEMA:10:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:2585893:0.0 > NET_VERSION:1:9 > HOST_ID:2:afc0ae8e-1126-426f-aa5b-0fce5578531c > RPC_READY:44:true > TOKENS:13:<hidden> > ## result of nodetool gossipinfo on 1.1.1.2 > /x.x.x.x > generation:1464511828 > heartbeat:46798699 > STATUS:14:NORMAL,-1022166408914492069 > LOAD:46798599:3.40042922062E11 > SCHEMA:37196731:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:46798698:0.0 > NET_VERSION:1:9 > HOST_ID:2:a08fbe58-779f-49f7-a0be-0a288d18c059 > RPC_READY:44:true > TOKENS:13:<hidden> > /1.1.1.1 > generation:1471493280 > heartbeat:25624222 > STATUS:25498:NORMAL,-1023351769271276256 > LOAD:25624146:3.51052126949E11 > SCHEMA:16021810:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:1.1.1.1 > SEVERITY:25624221:0.0 > NET_VERSION:1:9 > HOST_ID:2:b79124e9-394c-4400-a8e7-a0c94aec6878 > RPC_READY:25524:true > TOKENS:25497:<hidden> > /x.x.x.x > generation:1467205802 > heartbeat:38628440 > STATUS:14:NORMAL,-1124164164231176551 > LOAD:38628359:3.06404316564E11 > SCHEMA:29026044:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:38628439:0.0 > NET_VERSION:1:9 > HOST_ID:2:6dabba2e-085d-497b-a97e-b6ebf65258d6 > RPC_READY:44:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1479090215 > heartbeat:2581923 > STATUS:16:NORMAL,-1034930499164644146 > LOAD:2581842:2.60571265374E11 > SCHEMA:10:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:2581922:0.0 > NET_VERSION:1:9 > HOST_ID:2:a01d3304-5d8d-4ac9-935f-597ebf038e70 > RPC_READY:44:true > TOKENS:15:<hidden> > /x.x.x.x > generation:1479089630 > heartbeat:2583690 > STATUS:16:NORMAL,-1104754836621297034 > LOAD:2583656:2.59767873976E11 > SCHEMA:10:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:2583689:0.0 > NET_VERSION:1:9 > HOST_ID:2:81d4608f-d1df-4f52-b104-5f1c7a3b4c96 > RPC_READY:44:true > TOKENS:15:<hidden> > /x.x.x.x > generation:1471482865 > heartbeat:25655708 > STATUS:28484:NORMAL,-1005894856009564957 > LOAD:25655674:3.35493584628E11 > SCHEMA:16053376:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:25655707:0.0 > NET_VERSION:1:9 > HOST_ID:2:b39ee3e9-8f69-4f5c-8473-0e906c2301ca > RPC_READY:28512:true > TOKENS:28483:<hidden> > /1.1.1.2 > generation:1479090411 > heartbeat:2581314 > STATUS:14:NORMAL,-1016248794247274800 > LOAD:2581277:2.71127001342E11 > SCHEMA:10:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:1.1.1.2 > SEVERITY:2581316:0.0 > NET_VERSION:1:9 > HOST_ID:2:8caa54f3-7d67-40d6-b224-8c64a1d289be > RPC_READY:44:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1465163903 > heartbeat:44821587 > STATUS:14:NORMAL,-1077736070944113862 > LOAD:44821415:3.26343988237E11 > SCHEMA:35219220:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:44821586:0.0 > NET_VERSION:1:9 > HOST_ID:2:71db8e6a-cdbb-4924-860c-34e40731719f > RPC_READY:42:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1467203191 > heartbeat:38636325 > STATUS:14:NORMAL,-100689147463310850 > LOAD:38636153:3.15710669743E11 > SCHEMA:29033962:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:38636327:0.0 > NET_VERSION:1:9 > HOST_ID:2:5582b905-7136-4e87-b1a9-d450ea2eb304 > RPC_READY:42:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1465206313 > heartbeat:44692816 > STATUS:16:NORMAL,-101785227522144798 > LOAD:44692800:3.18542273643E11 > SCHEMA:35090498:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:44692815:0.0 > NET_VERSION:1:9 > HOST_ID:2:0ec779f9-3ee4-41a2-80b2-f27f25521964 > RPC_READY:44:true > TOKENS:15:<hidden> > /x.x.x.x > generation:1467201538 > heartbeat:38641336 > STATUS:16:NORMAL,-1004145257859511465 > LOAD:38641252:3.22115490439E11 > SCHEMA:29038976:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:38641338:0.0 > NET_VERSION:1:9 > HOST_ID:2:0ab413a9-5608-4d7b-97dc-6ed655e76dd0 > RPC_READY:44:true > TOKENS:15:<hidden> > /x.x.x.x > generation:1467201918 > heartbeat:38640040 > STATUS:16:NORMAL,-1068518488491269501 > LOAD:38640012:3.30172166769E11 > SCHEMA:29037717:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:38640039:0.0 > NET_VERSION:1:9 > HOST_ID:2:7939c565-dc6e-43ad-b6ed-cfa3a19b20ab > RPC_READY:44:true > TOKENS:15:<hidden> > /x.x.x.x > generation:1479089944 > heartbeat:2582733 > STATUS:14:NORMAL,-109330739329126024 > LOAD:2582564:2.58856479776E11 > SCHEMA:10:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:2582732:0.042265426367521286 > NET_VERSION:1:9 > HOST_ID:2:d2cf2695-25b7-4aad-92ab-9a0b593df5fb > RPC_READY:42:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1474550583 > heartbeat:16351050 > STATUS:14:NORMAL,-1012946818076902491 > LOAD:16350923:3.32929676598E11 > SCHEMA:6748586:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:16351049:0.0 > NET_VERSION:1:9 > HOST_ID:2:8e6417ec-e375-40a4-bb8d-3d99148846ff > RPC_READY:42:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1464678970 > heartbeat:46292263 > STATUS:14:NORMAL,-100392380890457765 > LOAD:46292236:3.23397923906E11 > SCHEMA:36689950:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:46292265:0.0 > NET_VERSION:1:9 > HOST_ID:2:62323ea7-8d66-46fd-93e1-ff9bbf772c17 > RPC_READY:44:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1467202117 > heartbeat:38639449 > STATUS:16:NORMAL,-1023597752726656847 > LOAD:38639304:3.28121029924E11 > SCHEMA:29037141:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:38639451:0.0 > NET_VERSION:1:9 > HOST_ID:2:10b51458-800b-46f2-a80a-c2f2bbe1e46d > RPC_READY:44:true > TOKENS:15:<hidden> > /x.x.x.x > generation:1479087904 > heartbeat:2588919 > STATUS:14:NORMAL,-1013592683752719902 > LOAD:2588748:2.78416391781E11 > SCHEMA:10:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:2588918:0.0 > NET_VERSION:1:9 > HOST_ID:2:afcdc0a9-de5a-446d-868e-ee1dabfc0107 > RPC_READY:44:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1464513989 > heartbeat:46792633 > STATUS:14:NORMAL,-1063467021041506926 > LOAD:46792498:3.41276898418E11 > SCHEMA:37190355:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:46792632:0.0 > NET_VERSION:1:9 > HOST_ID:2:79a69fe9-de49-49d3-b8cb-fbc59e08a821 > RPC_READY:44:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1479088908 > heartbeat:2585891 > STATUS:14:NORMAL,-1060190437871147906 > LOAD:2585849:2.45716342224E11 > SCHEMA:10:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:2585890:0.0 > NET_VERSION:1:9 > HOST_ID:2:afc0ae8e-1126-426f-aa5b-0fce5578531c > RPC_READY:44:true > TOKENS:13:<hidden> > {code} > One thing we think of as weird is that 'messeage pending' and 'MUTATION drop' > nodetool netstats > {code} > ## result of nodetool netstats on 1.1.1.1 > Mode: NORMAL > Not sending any streams. > Read Repair Statistics: > Attempted: 0 > Mismatch (Blocking): 0 > Mismatch (Background): 0 > Pool Name Active Pending Completed > Large messages n/a 7 2 > Small messages n/a 0 129707138163 > Gossip messages n/a 0 27177898 > ## result of nodetool netstats on 1.1.1.2 > Mode: NORMAL > Not sending any streams. > Read Repair Statistics: > Attempted: 0 > Mismatch (Blocking): 0 > Mismatch (Background): 3 > Pool Name Active Pending Completed > Large messages n/a 0 0 > Small messages n/a 484 19297343088 > Gossip messages n/a 0 2782060 > {code} > nodetool tpstats > {code} > ## result of nodetool tpstats on 1.1.1.1 > Pool Name Active Pending Completed Blocked All > time blocked > Native-Transport-Requests 0 0 55687222731 0 > 4381472 > ReadRepairStage 0 0 70 0 > 0 > ReadStage 0 0 39737862 0 > 0 > RequestResponseStage 1 0 65682103003 0 > 0 > CounterMutationStage 0 0 0 0 > 0 > MutationStage 0 0 100679076772 0 > 0 > MemtablePostFlush 0 0 215235 0 > 0 > GossipStage 0 0 26510703 0 > 0 > MiscStage 0 0 0 0 > 0 > PendingRangeCalculator 0 0 37 0 > 0 > AntiEntropyStage 0 0 0 0 > 0 > CacheCleanupExecutor 0 0 184 0 > 0 > MigrationStage 0 0 59 0 > 0 > HintedHandoff 0 0 93 0 > 0 > ValidationExecutor 0 0 0 0 > 0 > MemtableFlushWriter 0 0 74239 0 > 0 > InternalResponseStage 0 0 3636382 0 > 0 > Sampler 0 0 0 0 > 0 > MemtableReclaimMemory 0 0 74239 0 > 0 > CompactionExecutor 0 0 5694753 0 > 0 > Message type Dropped > RANGE_SLICE 0 > READ_REPAIR 0 > PAGED_RANGE 0 > READ 38268 > MUTATION 615972 > _TRACE 0 > REQUEST_RESPONSE 0 > COUNTER_MUTATION 0 > ## result of nodetool tpstats on 1.1.1.2 > Pool Name Active Pending Completed Blocked All > time blocked > Native-Transport-Requests 0 0 5783148174 0 > 970194 > ReadRepairStage 0 0 23 0 > 0 > ReadStage 0 0 227034 0 > 0 > RequestResponseStage 0 0 9656431051 0 > 0 > CounterMutationStage 0 0 0 0 > 0 > MutationStage 0 1 14494593737 0 > 0 > MemtablePostFlush 0 0 24720 0 > 0 > GossipStage 0 0 2667784 0 > 0 > MiscStage 0 0 0 0 > 0 > PendingRangeCalculator 0 0 26 0 > 0 > AntiEntropyStage 0 0 0 0 > 0 > CacheCleanupExecutor 0 0 0 0 > 0 > MigrationStage 0 0 0 0 > 0 > HintedHandoff 0 0 29 0 > 0 > ValidationExecutor 0 0 0 0 > 0 > MemtableFlushWriter 0 0 10555 0 > 0 > InternalResponseStage 0 0 3425759 0 > 0 > Sampler 0 0 0 0 > 0 > MemtableReclaimMemory 0 0 10555 0 > 0 > CompactionExecutor 2 3 551545 0 > 0 > Message type Dropped > RANGE_SLICE 0 > READ_REPAIR 0 > PAGED_RANGE 0 > READ 0 > MUTATION 0 > _TRACE 0 > REQUEST_RESPONSE 0 > COUNTER_MUTATION 0 > {code} > We found a similar issue that was reported. > (https://issues.apache.org/jira/browse/CASSANDRA-4740) > *But I think it caused by other reasons, due to problem was occurrence in > recent months.* > {code} > ## on 1.1.1.1 > $ netstat -tn | grep 1.1.1.2 > tcp 0 0 1.1.1.1:50103 1.1.1.2:7000 ESTABLISHED > tcp 0 0 1.1.1.1:7000 1.1.1.2:45946 ESTABLISHED > tcp 0 0 1.1.1.1:38178 1.1.1.2:7000 ESTABLISHED > tcp 0 0 1.1.1.1:7000 1.1.1.2:59958 ESTABLISHED > ## on 1.1.1.2 > $ netstat -tn | grep 1.1.1.2 > tcp 0 0 1.1.1.2:7000 1.1.1.16:32973 ESTABLISHED > tcp 0 0 1.1.1.2:34929 1.1.1.16:7000 ESTABLISHED > tcp 0 50 1.1.1.2:59270 1.1.1.16:7000 ESTABLISHED > tcp 0 0 1.1.1.2:7000 1.1.1.1:50103 ESTABLISHED > tcp 0 0 1.1.1.2:45946 1.1.1.1:7000 ESTABLISHED > tcp 0 0 1.1.1.2:59958 1.1.1.1:7000 ESTABLISHED > tcp 0 0 1.1.1.2:7000 1.1.1.16:55680 ESTABLISHED > tcp 0 0 1.1.1.2:7000 1.1.1.1:38178 ESTABLISHED > {code} > jvm and kernel are same on every nodes > {code} > $ uname -a > Linux ip-10-211-195-229.ap-northeast-2.compute.internal 4.5.0-coreos-r1 #2 > SMP Thu May 26 22:21:06 UTC 2016 x86_64 Linux > $ java -version > java version "1.8.0_102" > Java(TM) SE Runtime Environment (build 1.8.0_102-b14) > Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode) > $ which java > /pang/program/jdk/bin/java > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)