[ https://issues.apache.org/jira/browse/CASSANDRA-13042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15786943#comment-15786943 ]
YheonHo.Choi commented on CASSANDRA-13042: ------------------------------------------ These days, we face problems almost every day. We need a solution as soon as fast. > The two cassandra nodes suddenly encounter hints each other and failed > replaying. > --------------------------------------------------------------------------------- > > Key: CASSANDRA-13042 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13042 > Project: Cassandra > Issue Type: Bug > Reporter: YheonHo.Choi > Attachments: out_2.2.2.1.txt, out_2.2.2.2.txt > > > Although there are no changes to cassandra, two node suddenly encounter hints > and failed replaying. > Any commands like disablethrift, disablegossip can not solve the above > problem and the only way was restart. > When we check the status of cluster, all nodes are looks UN but > describecluster show unreachable each other. > Here's the state of the cassandra during the above problem occurred. > IP addresses in report anonymized: > cassandra version: 2.2.5 > node 1 = 1.1.1.1 > node 2 = 1.1.1.2 > others = x.x.x.x > system.log > {code} > ## result of nodetool status on 1.1.1.1 > INFO [HintedHandoff:1] 2016-11-24 06:15:07,969 HintedHandOffManager.java:367 > - Started hinted handoff for host: 8caa54f3-7d67-40d6-b224-8c64a1d289be with > IP: /1.1.1.2 > INFO [HintedHandoff:1] 2016-11-24 06:15:09,969 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.2; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > INFO [HintedHandoff:2] 2016-11-24 06:25:09,736 HintedHandOffManager.java:367 > - Started hinted handoff for host: 8caa54f3-7d67-40d6-b224-8c64a1d289be with > IP: /1.1.1.2 > INFO [HintedHandoff:2] 2016-11-24 06:25:11,738 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.2; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > WARN [MemtableFlushWriter:55270] 2016-11-24 06:25:12,625 > BigTableWriter.java:184 - Writing large partition > system/hints:d640677d-f354-aa8c-be89-d2a1648c24b2 (109029803 bytes) > WARN [CompactionExecutor:37908] 2016-11-24 06:35:23,682 > BigTableWriter.java:184 - Writing large partition > system/hints:8caa54f3-7d67-40d6-b224-8c64a1d289be (250651758 bytes) > INFO [HintedHandoff:1] 2016-11-24 06:35:23,727 HintedHandOffManager.java:367 > - Started hinted handoff for host: 8caa54f3-7d67-40d6-b224-8c64a1d289be with > IP: /1.1.1.2 > INFO [HintedHandoff:1] 2016-11-24 06:35:25,728 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.2; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > WARN [CompactionExecutor:37909] 2016-11-24 06:45:53,615 > BigTableWriter.java:184 - Writing large partition > system/hints:8caa54f3-7d67-40d6-b224-8c64a1d289be (340801514 bytes) > INFO [HintedHandoff:2] 2016-11-24 06:45:53,718 HintedHandOffManager.java:367 > - Started hinted handoff for host: 8caa54f3-7d67-40d6-b224-8c64a1d289be with > IP: /1.1.1.2 > INFO [HintedHandoff:2] 2016-11-24 06:45:55,719 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.2; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > WARN [CompactionExecutor:37912] 2016-11-24 06:56:20,884 > BigTableWriter.java:184 - Writing large partition > system/hints:8caa54f3-7d67-40d6-b224-8c64a1d289be (472465093 bytes) > INFO [HintedHandoff:1] 2016-11-24 06:56:20,966 HintedHandOffManager.java:367 > - Started hinted handoff for host: 8caa54f3-7d67-40d6-b224-8c64a1d289be with > IP: /1.1.1.2 > INFO [HintedHandoff:1] 2016-11-24 06:56:22,967 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.2; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > WARN [CompactionExecutor:37911] 2016-11-24 07:07:12,568 > BigTableWriter.java:184 - Writing large partition > system/hints:8caa54f3-7d67-40d6-b224-8c64a1d289be (577392172 bytes) > INFO [HintedHandoff:2] 2016-11-24 07:07:12,643 HintedHandOffManager.java:367 > - Started hinted handoff for host: 8caa54f3-7d67-40d6-b224-8c64a1d289be with > IP: /1.1.1.2 > INFO [HintedHandoff:2] 2016-11-24 07:07:14,643 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.2; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > INFO [IndexSummaryManager:1] 2016-11-24 07:09:15,929 > IndexSummaryRedistribution.java:74 - Redistributing index summaries > ## result of nodetool status on 1.1.1.2 > INFO [HintedHandoff:1] 2016-11-24 06:11:37,300 HintedHandOffManager.java:367 > - Started hinted handoff for host: b79124e9-394c-4400-a8e7-a0c94aec6878 with > IP: /1.1.1.1 > INFO [HintedHandoff:1] 2016-11-24 06:11:39,301 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.1; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > INFO [HintedHandoff:2] 2016-11-24 06:22:17,946 HintedHandOffManager.java:367 > - Started hinted handoff for host: b79124e9-394c-4400-a8e7-a0c94aec6878 with > IP: /1.1.1.1 > INFO [HintedHandoff:2] 2016-11-24 06:22:19,948 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.1; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > INFO [IndexSummaryManager:1] 2016-11-24 06:27:00,177 > IndexSummaryRedistribution.java:74 - Redistributing index summaries > WARN [CompactionExecutor:2315] 2016-11-24 06:32:28,159 > BigTableWriter.java:184 - Writing large partition > system/hints:b79124e9-394c-4400-a8e7-a0c94aec6878 (121683824 bytes) > INFO [HintedHandoff:1] 2016-11-24 06:32:28,338 HintedHandOffManager.java:367 > - Started hinted handoff for host: b79124e9-394c-4400-a8e7-a0c94aec6878 with > IP: /1.1.1.1 > INFO [HintedHandoff:1] 2016-11-24 06:32:30,340 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.1; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > WARN [MemtableFlushWriter:8728] 2016-11-24 06:32:31,272 > BigTableWriter.java:184 - Writing large partition > system/hints:00444c39-e924-91b7-7868-ec4ac9a0e7a8 (108691834 bytes) > WARN [CompactionExecutor:2316] 2016-11-24 06:42:58,591 > BigTableWriter.java:184 - Writing large partition > system/hints:b79124e9-394c-4400-a8e7-a0c94aec6878 (327308871 bytes) > INFO [HintedHandoff:2] 2016-11-24 06:42:58,661 HintedHandOffManager.java:367 > - Started hinted handoff for host: b79124e9-394c-4400-a8e7-a0c94aec6878 with > IP: /1.1.1.1 > INFO [HintedHandoff:2] 2016-11-24 06:43:00,662 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.1; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > WARN [MemtableFlushWriter:8739] 2016-11-24 06:52:42,921 > BigTableWriter.java:184 - Writing large partition > system/hints:00444c39-e924-91b7-7868-ec4ac9a0e7a8 (105086190 bytes) > WARN [CompactionExecutor:2316] 2016-11-24 06:53:43,722 > BigTableWriter.java:184 - Writing large partition > system/hints:b79124e9-394c-4400-a8e7-a0c94aec6878 (449333141 bytes) > INFO [HintedHandoff:1] 2016-11-24 06:53:43,769 HintedHandOffManager.java:367 > - Started hinted handoff for host: b79124e9-394c-4400-a8e7-a0c94aec6878 with > IP: /1.1.1.1 > INFO [HintedHandoff:1] 2016-11-24 06:53:45,770 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.1; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > WARN [CompactionExecutor:2316] 2016-11-24 07:04:29,379 > BigTableWriter.java:184 - Writing large partition > system/hints:b79124e9-394c-4400-a8e7-a0c94aec6878 (460781059 bytes) > INFO [HintedHandoff:2] 2016-11-24 07:04:29,572 HintedHandOffManager.java:367 > - Started hinted handoff for host: b79124e9-394c-4400-a8e7-a0c94aec6878 with > IP: /1.1.1.1 > INFO [HintedHandoff:2] 2016-11-24 07:04:31,574 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.1; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > WARN [MemtableFlushWriter:8749] 2016-11-24 07:04:32,553 > BigTableWriter.java:184 - Writing large partition > system/hints:00444c39-e924-91b7-7868-ec4ac9a0e7a8 (117146652 bytes) > {code} > nodetool status > {code} > ## result of nodetool status on 1.1.1.1 > Datacenter: datacenter1 > ======================= > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns Host ID > Rack > UN 1.1.1.2 252.48 GB 256 ? > 8caa54f3-7d67-40d6-b224-8c64a1d289be rack1 > UN x.x.x.x 316.6 GB 256 ? > a08fbe58-779f-49f7-a0be-0a288d18c059 rack1 > UN x.x.x.x 240.92 GB 256 ? > d2cf2695-25b7-4aad-92ab-9a0b593df5fb rack1 > UN x.x.x.x 242.68 GB 256 ? > a01d3304-5d8d-4ac9-935f-597ebf038e70 rack1 > UN x.x.x.x 326.76 GB 256 ? > b79124e9-394c-4400-a8e7-a0c94aec6878 rack1 > UN x.x.x.x 296.61 GB 256 ? > 0ec779f9-3ee4-41a2-80b2-f27f25521964 rack1 > UN x.x.x.x 309.81 GB 256 ? > 8e6417ec-e375-40a4-bb8d-3d99148846ff rack1 > UN x.x.x.x 312.28 GB 256 ? > b39ee3e9-8f69-4f5c-8473-0e906c2301ca rack1 > UN x.x.x.x 299.9 GB 256 ? > 0ab413a9-5608-4d7b-97dc-6ed655e76dd0 rack1 > UN x.x.x.x 241.81 GB 256 ? > 81d4608f-d1df-4f52-b104-5f1c7a3b4c96 rack1 > UN x.x.x.x 228.76 GB 256 ? > afc0ae8e-1126-426f-aa5b-0fce5578531c rack1 > UN x.x.x.x 301.13 GB 256 ? > 62323ea7-8d66-46fd-93e1-ff9bbf772c17 rack1 > UN x.x.x.x 307.39 GB 256 ? > 7939c565-dc6e-43ad-b6ed-cfa3a19b20ab rack1 > UN x.x.x.x 259.21 GB 256 ? > afcdc0a9-de5a-446d-868e-ee1dabfc0107 rack1 > UN x.x.x.x 294.16 GB 256 ? > 5582b905-7136-4e87-b1a9-d450ea2eb304 rack1 > UN x.x.x.x 285.28 GB 256 ? > 6dabba2e-085d-497b-a97e-b6ebf65258d6 rack1 > UN x.x.x.x 305.59 GB 256 ? > 10b51458-800b-46f2-a80a-c2f2bbe1e46d rack1 > UN x.x.x.x 317.64 GB 256 ? > 79a69fe9-de49-49d3-b8cb-fbc59e08a821 rack1 > UN x.x.x.x 303.85 GB 256 ? > 71db8e6a-cdbb-4924-860c-34e40731719f rack1 > Note: Non-system keyspaces don't have the same replication settings, > effective ownership information is meaningless > ## result of nodetool status on 1.1.1.2 > Datacenter: datacenter1 > ======================= > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns Host ID > Rack > UN x.x.x.x 252.48 GB 256 ? > 8caa54f3-7d67-40d6-b224-8c64a1d289be rack1 > UN x.x.x.x 316.6 GB 256 ? > a08fbe58-779f-49f7-a0be-0a288d18c059 rack1 > UN x.x.x.x 240.92 GB 256 ? > d2cf2695-25b7-4aad-92ab-9a0b593df5fb rack1 > UN x.x.x.x 242.68 GB 256 ? > a01d3304-5d8d-4ac9-935f-597ebf038e70 rack1 > UN 1.1.1.1 326.76 GB 256 ? > b79124e9-394c-4400-a8e7-a0c94aec6878 rack1 > UN x.x.x.x 296.61 GB 256 ? > 0ec779f9-3ee4-41a2-80b2-f27f25521964 rack1 > UN x.x.x.x 309.81 GB 256 ? > 8e6417ec-e375-40a4-bb8d-3d99148846ff rack1 > UN x.x.x.x 312.28 GB 256 ? > b39ee3e9-8f69-4f5c-8473-0e906c2301ca rack1 > UN x.x.x.x 299.9 GB 256 ? > 0ab413a9-5608-4d7b-97dc-6ed655e76dd0 rack1 > UN x.x.x.x 241.81 GB 256 ? > 81d4608f-d1df-4f52-b104-5f1c7a3b4c96 rack1 > UN x.x.x.x 228.76 GB 256 ? > afc0ae8e-1126-426f-aa5b-0fce5578531c rack1 > UN x.x.x.x 301.13 GB 256 ? > 62323ea7-8d66-46fd-93e1-ff9bbf772c17 rack1 > UN x.x.x.x 307.39 GB 256 ? > 7939c565-dc6e-43ad-b6ed-cfa3a19b20ab rack1 > UN x.x.x.x 259.21 GB 256 ? > afcdc0a9-de5a-446d-868e-ee1dabfc0107 rack1 > UN x.x.x.x 294.16 GB 256 ? > 5582b905-7136-4e87-b1a9-d450ea2eb304 rack1 > UN x.x.x.x 285.28 GB 256 ? > 6dabba2e-085d-497b-a97e-b6ebf65258d6 rack1 > UN x.x.x.x 305.59 GB 256 ? > 10b51458-800b-46f2-a80a-c2f2bbe1e46d rack1 > UN x.x.x.x 317.64 GB 256 ? > 79a69fe9-de49-49d3-b8cb-fbc59e08a821 rack1 > UN x.x.x.x 303.85 GB 256 ? > 71db8e6a-cdbb-4924-860c-34e40731719f rack1 > Note: Non-system keyspaces don't have the same replication settings, > effective ownership information is meaningless > {code} > nodetool describecluster > {code} > ## result of nodetool describecluster on 1.1.1.1 > Cluster Information: > Name: metric > Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch > Partitioner: org.apache.cassandra.dht.Murmur3Partitioner > Schema versions: > c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9: [x.x.x.x, x.x.x.x, > x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, > x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x] > UNREACHABLE: [1.1.1.2] > ## result of nodetool describecluster on 1.1.1.2 > Cluster Information: > Name: metric > Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch > Partitioner: org.apache.cassandra.dht.Murmur3Partitioner > Schema versions: > c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9: [x.x.x.x, x.x.x.x, > x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, > x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x, x.x.x.x] > UNREACHABLE: [1.1.1.1] > {code} > We've looked at whether schema is different. > nodetool gossipinfo > {code} > ## result of nodetool gossipinfo on 1.1.1.1 > /x.x.x.x > generation:1464511828 > heartbeat:46798705 > STATUS:14:NORMAL,-1022166408914492069 > LOAD:46798599:3.40042922062E11 > SCHEMA:37196731:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:46798704:0.0 > NET_VERSION:1:9 > HOST_ID:2:a08fbe58-779f-49f7-a0be-0a288d18c059 > RPC_READY:44:true > TOKENS:13:<hidden> > /1.1.1.1 > generation:1471493280 > heartbeat:25624228 > STATUS:25498:NORMAL,-1023351769271276256 > LOAD:25624146:3.51052126949E11 > SCHEMA:16021810:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:1.1.1.1 > SEVERITY:25624230:0.0 > NET_VERSION:1:9 > HOST_ID:2:b79124e9-394c-4400-a8e7-a0c94aec6878 > RPC_READY:25524:true > TOKENS:25497:<hidden> > /x.x.x.x > generation:1467205802 > heartbeat:38628440 > STATUS:14:NORMAL,-1124164164231176551 > LOAD:38628359:3.06404316564E11 > SCHEMA:29026044:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:38628439:0.0 > NET_VERSION:1:9 > HOST_ID:2:6dabba2e-085d-497b-a97e-b6ebf65258d6 > RPC_READY:44:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1479090215 > heartbeat:2581923 > STATUS:16:NORMAL,-1034930499164644146 > LOAD:2581842:2.60571265374E11 > SCHEMA:10:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:2581922:0.0 > NET_VERSION:1:9 > HOST_ID:2:a01d3304-5d8d-4ac9-935f-597ebf038e70 > RPC_READY:44:true > TOKENS:15:<hidden> > /x.x.x.x > generation:1479089630 > heartbeat:2583693 > STATUS:16:NORMAL,-1104754836621297034 > LOAD:2583656:2.59767873976E11 > SCHEMA:10:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:2583692:0.0 > NET_VERSION:1:9 > HOST_ID:2:81d4608f-d1df-4f52-b104-5f1c7a3b4c96 > RPC_READY:44:true > TOKENS:15:<hidden> > /x.x.x.x > generation:1471482865 > heartbeat:25655714 > STATUS:28484:NORMAL,-1005894856009564957 > LOAD:25655674:3.35493584628E11 > SCHEMA:16053376:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:25655716:0.0 > NET_VERSION:1:9 > HOST_ID:2:b39ee3e9-8f69-4f5c-8473-0e906c2301ca > RPC_READY:28512:true > TOKENS:28483:<hidden> > /1.1.1.2 > generation:1479090411 > heartbeat:2581314 > STATUS:14:NORMAL,-1016248794247274800 > LOAD:2581277:2.71127001342E11 > SCHEMA:10:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:1.1.1.2 > SEVERITY:2581313:0.0 > NET_VERSION:1:9 > HOST_ID:2:8caa54f3-7d67-40d6-b224-8c64a1d289be > RPC_READY:44:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1465163903 > heartbeat:44821598 > STATUS:14:NORMAL,-1077736070944113862 > LOAD:44821597:3.26381003723E11 > SCHEMA:35219220:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:44821595:0.0 > NET_VERSION:1:9 > HOST_ID:2:71db8e6a-cdbb-4924-860c-34e40731719f > RPC_READY:42:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1467203191 > heartbeat:38636328 > STATUS:14:NORMAL,-100689147463310850 > LOAD:38636153:3.15710669743E11 > SCHEMA:29033962:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:38636327:0.0 > NET_VERSION:1:9 > HOST_ID:2:5582b905-7136-4e87-b1a9-d450ea2eb304 > RPC_READY:42:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1465206313 > heartbeat:44692819 > STATUS:16:NORMAL,-101785227522144798 > LOAD:44692800:3.18542273643E11 > SCHEMA:35090498:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:44692818:0.0 > NET_VERSION:1:9 > HOST_ID:2:0ec779f9-3ee4-41a2-80b2-f27f25521964 > RPC_READY:44:true > TOKENS:15:<hidden> > /x.x.x.x > generation:1467201538 > heartbeat:38641336 > STATUS:16:NORMAL,-1004145257859511465 > LOAD:38641252:3.22115490439E11 > SCHEMA:29038976:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:38641338:0.0 > NET_VERSION:1:9 > HOST_ID:2:0ab413a9-5608-4d7b-97dc-6ed655e76dd0 > RPC_READY:44:true > TOKENS:15:<hidden> > /x.x.x.x > generation:1467201918 > heartbeat:38640043 > STATUS:16:NORMAL,-1068518488491269501 > LOAD:38640012:3.30172166769E11 > SCHEMA:29037717:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:38640045:0.0 > NET_VERSION:1:9 > HOST_ID:2:7939c565-dc6e-43ad-b6ed-cfa3a19b20ab > RPC_READY:44:true > TOKENS:15:<hidden> > /x.x.x.x > generation:1479089944 > heartbeat:2582739 > STATUS:14:NORMAL,-109330739329126024 > LOAD:2582564:2.58856479776E11 > SCHEMA:10:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:2582738:0.0 > NET_VERSION:1:9 > HOST_ID:2:d2cf2695-25b7-4aad-92ab-9a0b593df5fb > RPC_READY:42:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1474550583 > heartbeat:16351053 > STATUS:14:NORMAL,-1012946818076902491 > LOAD:16350923:3.32929676598E11 > SCHEMA:6748586:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:16351052:0.042265426367521286 > NET_VERSION:1:9 > HOST_ID:2:8e6417ec-e375-40a4-bb8d-3d99148846ff > RPC_READY:42:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1464678970 > heartbeat:46292266 > STATUS:14:NORMAL,-100392380890457765 > LOAD:46292236:3.23397923906E11 > SCHEMA:36689950:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:46292265:0.0 > NET_VERSION:1:9 > HOST_ID:2:62323ea7-8d66-46fd-93e1-ff9bbf772c17 > RPC_READY:44:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1467202117 > heartbeat:38639452 > STATUS:16:NORMAL,-1023597752726656847 > LOAD:38639304:3.28121029924E11 > SCHEMA:29037141:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:38639451:0.0 > NET_VERSION:1:9 > HOST_ID:2:10b51458-800b-46f2-a80a-c2f2bbe1e46d > RPC_READY:44:true > TOKENS:15:<hidden> > /x.x.x.x > generation:1479087904 > heartbeat:2588919 > STATUS:14:NORMAL,-1013592683752719902 > LOAD:2588748:2.78416391781E11 > SCHEMA:10:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:2588918:0.0 > NET_VERSION:1:9 > HOST_ID:2:afcdc0a9-de5a-446d-868e-ee1dabfc0107 > RPC_READY:44:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1464513989 > heartbeat:46792633 > STATUS:14:NORMAL,-1063467021041506926 > LOAD:46792498:3.41276898418E11 > SCHEMA:37190355:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:46792632:0.0 > NET_VERSION:1:9 > HOST_ID:2:79a69fe9-de49-49d3-b8cb-fbc59e08a821 > RPC_READY:44:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1479088908 > heartbeat:2585894 > STATUS:14:NORMAL,-1060190437871147906 > LOAD:2585849:2.45716342224E11 > SCHEMA:10:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:2585893:0.0 > NET_VERSION:1:9 > HOST_ID:2:afc0ae8e-1126-426f-aa5b-0fce5578531c > RPC_READY:44:true > TOKENS:13:<hidden> > ## result of nodetool gossipinfo on 1.1.1.2 > /x.x.x.x > generation:1464511828 > heartbeat:46798699 > STATUS:14:NORMAL,-1022166408914492069 > LOAD:46798599:3.40042922062E11 > SCHEMA:37196731:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:46798698:0.0 > NET_VERSION:1:9 > HOST_ID:2:a08fbe58-779f-49f7-a0be-0a288d18c059 > RPC_READY:44:true > TOKENS:13:<hidden> > /1.1.1.1 > generation:1471493280 > heartbeat:25624222 > STATUS:25498:NORMAL,-1023351769271276256 > LOAD:25624146:3.51052126949E11 > SCHEMA:16021810:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:1.1.1.1 > SEVERITY:25624221:0.0 > NET_VERSION:1:9 > HOST_ID:2:b79124e9-394c-4400-a8e7-a0c94aec6878 > RPC_READY:25524:true > TOKENS:25497:<hidden> > /x.x.x.x > generation:1467205802 > heartbeat:38628440 > STATUS:14:NORMAL,-1124164164231176551 > LOAD:38628359:3.06404316564E11 > SCHEMA:29026044:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:38628439:0.0 > NET_VERSION:1:9 > HOST_ID:2:6dabba2e-085d-497b-a97e-b6ebf65258d6 > RPC_READY:44:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1479090215 > heartbeat:2581923 > STATUS:16:NORMAL,-1034930499164644146 > LOAD:2581842:2.60571265374E11 > SCHEMA:10:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:2581922:0.0 > NET_VERSION:1:9 > HOST_ID:2:a01d3304-5d8d-4ac9-935f-597ebf038e70 > RPC_READY:44:true > TOKENS:15:<hidden> > /x.x.x.x > generation:1479089630 > heartbeat:2583690 > STATUS:16:NORMAL,-1104754836621297034 > LOAD:2583656:2.59767873976E11 > SCHEMA:10:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:2583689:0.0 > NET_VERSION:1:9 > HOST_ID:2:81d4608f-d1df-4f52-b104-5f1c7a3b4c96 > RPC_READY:44:true > TOKENS:15:<hidden> > /x.x.x.x > generation:1471482865 > heartbeat:25655708 > STATUS:28484:NORMAL,-1005894856009564957 > LOAD:25655674:3.35493584628E11 > SCHEMA:16053376:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:25655707:0.0 > NET_VERSION:1:9 > HOST_ID:2:b39ee3e9-8f69-4f5c-8473-0e906c2301ca > RPC_READY:28512:true > TOKENS:28483:<hidden> > /1.1.1.2 > generation:1479090411 > heartbeat:2581314 > STATUS:14:NORMAL,-1016248794247274800 > LOAD:2581277:2.71127001342E11 > SCHEMA:10:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:1.1.1.2 > SEVERITY:2581316:0.0 > NET_VERSION:1:9 > HOST_ID:2:8caa54f3-7d67-40d6-b224-8c64a1d289be > RPC_READY:44:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1465163903 > heartbeat:44821587 > STATUS:14:NORMAL,-1077736070944113862 > LOAD:44821415:3.26343988237E11 > SCHEMA:35219220:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:44821586:0.0 > NET_VERSION:1:9 > HOST_ID:2:71db8e6a-cdbb-4924-860c-34e40731719f > RPC_READY:42:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1467203191 > heartbeat:38636325 > STATUS:14:NORMAL,-100689147463310850 > LOAD:38636153:3.15710669743E11 > SCHEMA:29033962:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:38636327:0.0 > NET_VERSION:1:9 > HOST_ID:2:5582b905-7136-4e87-b1a9-d450ea2eb304 > RPC_READY:42:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1465206313 > heartbeat:44692816 > STATUS:16:NORMAL,-101785227522144798 > LOAD:44692800:3.18542273643E11 > SCHEMA:35090498:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:44692815:0.0 > NET_VERSION:1:9 > HOST_ID:2:0ec779f9-3ee4-41a2-80b2-f27f25521964 > RPC_READY:44:true > TOKENS:15:<hidden> > /x.x.x.x > generation:1467201538 > heartbeat:38641336 > STATUS:16:NORMAL,-1004145257859511465 > LOAD:38641252:3.22115490439E11 > SCHEMA:29038976:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:38641338:0.0 > NET_VERSION:1:9 > HOST_ID:2:0ab413a9-5608-4d7b-97dc-6ed655e76dd0 > RPC_READY:44:true > TOKENS:15:<hidden> > /x.x.x.x > generation:1467201918 > heartbeat:38640040 > STATUS:16:NORMAL,-1068518488491269501 > LOAD:38640012:3.30172166769E11 > SCHEMA:29037717:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:38640039:0.0 > NET_VERSION:1:9 > HOST_ID:2:7939c565-dc6e-43ad-b6ed-cfa3a19b20ab > RPC_READY:44:true > TOKENS:15:<hidden> > /x.x.x.x > generation:1479089944 > heartbeat:2582733 > STATUS:14:NORMAL,-109330739329126024 > LOAD:2582564:2.58856479776E11 > SCHEMA:10:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:2582732:0.042265426367521286 > NET_VERSION:1:9 > HOST_ID:2:d2cf2695-25b7-4aad-92ab-9a0b593df5fb > RPC_READY:42:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1474550583 > heartbeat:16351050 > STATUS:14:NORMAL,-1012946818076902491 > LOAD:16350923:3.32929676598E11 > SCHEMA:6748586:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:16351049:0.0 > NET_VERSION:1:9 > HOST_ID:2:8e6417ec-e375-40a4-bb8d-3d99148846ff > RPC_READY:42:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1464678970 > heartbeat:46292263 > STATUS:14:NORMAL,-100392380890457765 > LOAD:46292236:3.23397923906E11 > SCHEMA:36689950:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:46292265:0.0 > NET_VERSION:1:9 > HOST_ID:2:62323ea7-8d66-46fd-93e1-ff9bbf772c17 > RPC_READY:44:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1467202117 > heartbeat:38639449 > STATUS:16:NORMAL,-1023597752726656847 > LOAD:38639304:3.28121029924E11 > SCHEMA:29037141:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:38639451:0.0 > NET_VERSION:1:9 > HOST_ID:2:10b51458-800b-46f2-a80a-c2f2bbe1e46d > RPC_READY:44:true > TOKENS:15:<hidden> > /x.x.x.x > generation:1479087904 > heartbeat:2588919 > STATUS:14:NORMAL,-1013592683752719902 > LOAD:2588748:2.78416391781E11 > SCHEMA:10:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:2588918:0.0 > NET_VERSION:1:9 > HOST_ID:2:afcdc0a9-de5a-446d-868e-ee1dabfc0107 > RPC_READY:44:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1464513989 > heartbeat:46792633 > STATUS:14:NORMAL,-1063467021041506926 > LOAD:46792498:3.41276898418E11 > SCHEMA:37190355:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:46792632:0.0 > NET_VERSION:1:9 > HOST_ID:2:79a69fe9-de49-49d3-b8cb-fbc59e08a821 > RPC_READY:44:true > TOKENS:13:<hidden> > /x.x.x.x > generation:1479088908 > heartbeat:2585891 > STATUS:14:NORMAL,-1060190437871147906 > LOAD:2585849:2.45716342224E11 > SCHEMA:10:c777bf0c-72ca-3bad-ac52-b9ad1ecd23b9 > DC:6:datacenter1 > RACK:8:rack1 > RELEASE_VERSION:4:2.2.5 > RPC_ADDRESS:3:x.x.x.x > SEVERITY:2585890:0.0 > NET_VERSION:1:9 > HOST_ID:2:afc0ae8e-1126-426f-aa5b-0fce5578531c > RPC_READY:44:true > TOKENS:13:<hidden> > {code} > One thing we think of as weird is that 'messeage pending' and 'MUTATION drop' > nodetool netstats > {code} > ## result of nodetool netstats on 1.1.1.1 > Mode: NORMAL > Not sending any streams. > Read Repair Statistics: > Attempted: 0 > Mismatch (Blocking): 0 > Mismatch (Background): 0 > Pool Name Active Pending Completed > Large messages n/a 7 2 > Small messages n/a 0 129707138163 > Gossip messages n/a 0 27177898 > ## result of nodetool netstats on 1.1.1.2 > Mode: NORMAL > Not sending any streams. > Read Repair Statistics: > Attempted: 0 > Mismatch (Blocking): 0 > Mismatch (Background): 3 > Pool Name Active Pending Completed > Large messages n/a 0 0 > Small messages n/a 484 19297343088 > Gossip messages n/a 0 2782060 > {code} > nodetool tpstats > {code} > ## result of nodetool tpstats on 1.1.1.1 > Pool Name Active Pending Completed Blocked All > time blocked > Native-Transport-Requests 0 0 55687222731 0 > 4381472 > ReadRepairStage 0 0 70 0 > 0 > ReadStage 0 0 39737862 0 > 0 > RequestResponseStage 1 0 65682103003 0 > 0 > CounterMutationStage 0 0 0 0 > 0 > MutationStage 0 0 100679076772 0 > 0 > MemtablePostFlush 0 0 215235 0 > 0 > GossipStage 0 0 26510703 0 > 0 > MiscStage 0 0 0 0 > 0 > PendingRangeCalculator 0 0 37 0 > 0 > AntiEntropyStage 0 0 0 0 > 0 > CacheCleanupExecutor 0 0 184 0 > 0 > MigrationStage 0 0 59 0 > 0 > HintedHandoff 0 0 93 0 > 0 > ValidationExecutor 0 0 0 0 > 0 > MemtableFlushWriter 0 0 74239 0 > 0 > InternalResponseStage 0 0 3636382 0 > 0 > Sampler 0 0 0 0 > 0 > MemtableReclaimMemory 0 0 74239 0 > 0 > CompactionExecutor 0 0 5694753 0 > 0 > Message type Dropped > RANGE_SLICE 0 > READ_REPAIR 0 > PAGED_RANGE 0 > READ 38268 > MUTATION 615972 > _TRACE 0 > REQUEST_RESPONSE 0 > COUNTER_MUTATION 0 > ## result of nodetool tpstats on 1.1.1.2 > Pool Name Active Pending Completed Blocked All > time blocked > Native-Transport-Requests 0 0 5783148174 0 > 970194 > ReadRepairStage 0 0 23 0 > 0 > ReadStage 0 0 227034 0 > 0 > RequestResponseStage 0 0 9656431051 0 > 0 > CounterMutationStage 0 0 0 0 > 0 > MutationStage 0 1 14494593737 0 > 0 > MemtablePostFlush 0 0 24720 0 > 0 > GossipStage 0 0 2667784 0 > 0 > MiscStage 0 0 0 0 > 0 > PendingRangeCalculator 0 0 26 0 > 0 > AntiEntropyStage 0 0 0 0 > 0 > CacheCleanupExecutor 0 0 0 0 > 0 > MigrationStage 0 0 0 0 > 0 > HintedHandoff 0 0 29 0 > 0 > ValidationExecutor 0 0 0 0 > 0 > MemtableFlushWriter 0 0 10555 0 > 0 > InternalResponseStage 0 0 3425759 0 > 0 > Sampler 0 0 0 0 > 0 > MemtableReclaimMemory 0 0 10555 0 > 0 > CompactionExecutor 2 3 551545 0 > 0 > Message type Dropped > RANGE_SLICE 0 > READ_REPAIR 0 > PAGED_RANGE 0 > READ 0 > MUTATION 0 > _TRACE 0 > REQUEST_RESPONSE 0 > COUNTER_MUTATION 0 > {code} > We found a similar issue that was reported. > (https://issues.apache.org/jira/browse/CASSANDRA-4740) > *But I think it caused by other reasons, due to problem was occurrence in > recent months.* > {code} > ## on 1.1.1.1 > $ netstat -tn | grep 1.1.1.2 > tcp 0 0 1.1.1.1:50103 1.1.1.2:7000 ESTABLISHED > tcp 0 0 1.1.1.1:7000 1.1.1.2:45946 ESTABLISHED > tcp 0 0 1.1.1.1:38178 1.1.1.2:7000 ESTABLISHED > tcp 0 0 1.1.1.1:7000 1.1.1.2:59958 ESTABLISHED > ## on 1.1.1.2 > $ netstat -tn | grep 1.1.1.2 > tcp 0 0 1.1.1.2:7000 1.1.1.16:32973 ESTABLISHED > tcp 0 0 1.1.1.2:34929 1.1.1.16:7000 ESTABLISHED > tcp 0 50 1.1.1.2:59270 1.1.1.16:7000 ESTABLISHED > tcp 0 0 1.1.1.2:7000 1.1.1.1:50103 ESTABLISHED > tcp 0 0 1.1.1.2:45946 1.1.1.1:7000 ESTABLISHED > tcp 0 0 1.1.1.2:59958 1.1.1.1:7000 ESTABLISHED > tcp 0 0 1.1.1.2:7000 1.1.1.16:55680 ESTABLISHED > tcp 0 0 1.1.1.2:7000 1.1.1.1:38178 ESTABLISHED > {code} > jvm and kernel are same on every nodes > {code} > $ uname -a > Linux ip-10-211-195-229.ap-northeast-2.compute.internal 4.5.0-coreos-r1 #2 > SMP Thu May 26 22:21:06 UTC 2016 x86_64 Linux > $ java -version > java version "1.8.0_102" > Java(TM) SE Runtime Environment (build 1.8.0_102-b14) > Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode) > $ which java > /pang/program/jdk/bin/java > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)