[ https://issues.apache.org/jira/browse/CASSANDRA-18347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sam Tunnicliffe updated CASSANDRA-18347: ---------------------------------------- Resolution: Not A Problem Status: Resolved (was: Open) > CEP-21: Startup failures in Python dtests around TCM_REPLAY_REQ > --------------------------------------------------------------- > > Key: CASSANDRA-18347 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18347 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership, Cluster/Schema > Reporter: Caleb Rackliffe > Priority: Normal > Fix For: NA > > > There are currently widespread, locally reproducible failures in the Python > dtests against the {{cep-21-tcm}} branch. For example... > > {noformat}pytest --cassandra-dir=/Users/maedhroz/Forks/cassandra > topology_test.py::TestTopology::test_decommissioned_node_cant_rejoin{noformat} > {noformat}pytest --cassandra-dir=/Users/maedhroz/Forks/cassandra > materialized_views_test.py::TestMaterializedViews::test_query_new_column{noformat} > {noformat}pytest --cassandra-dir=/Users/maedhroz/Forks/cassandra > read_repair_test.py::TestSpeculativeReadRepair::test_normal_read_repair{noformat} > https://app.circleci.com/pipelines/github/maedhroz/cassandra/701/workflows/44a5c7e0-0de0-4839-bbd0-80771fe23843/jobs/7251 > https://app.circleci.com/pipelines/github/beobal/cassandra/406/workflows/00cdb02e-4b3e-477a-b997-403121172249/jobs/4204/tests > The death spiral in the node startup logs starts like this… > {noformat} > WARN [Messaging-EventLoop-3-1] 2023-03-17 11:55:34,037 NoSpamLogger.java:108 > - /127.0.0.2:7000->/127.0.0.1:7000-SMALL_MESSAGES-[no-channel] dropping > message of type TCM_REPLAY_REQ whose timeout expired before reaching the > network > ERROR [InternalResponseStage:3] 2023-03-17 11:55:34,038 > RemoteProcessor.java:164 - Got error from /127.0.0.1:7000: TIMEOUT when > sending TCM_REPLAY_REQ, retrying on > CandidateIterator{candidates=[/127.0.0.2:7000, /127.0.0.1:7000], > checkLive=false} > INFO [Messaging-EventLoop-3-12] 2023-03-17 11:55:34,099 > InboundConnectionInitiator.java:567 - > /127.0.0.2:7000(/127.0.0.2:49763)->/127.0.0.2:7000-SMALL_MESSAGES-1b9301b6 > messaging connection established, version = 13, framing = CRC, encryption = > unencrypted > INFO [Messaging-EventLoop-3-9] 2023-03-17 11:55:34,099 > OutboundConnection.java:1164 - > /127.0.0.2:7000(/127.0.0.2:49763)->/127.0.0.2:7000-SMALL_MESSAGES-a9302b2e > successfully connected, version = 13, framing = CRC, encryption = unencrypted > WARN [InternalMetadataStage:5] 2023-03-17 11:55:34,100 NoSpamLogger.java:108 > - Not currently a member of the CMS > INFO [Messaging-EventLoop-3-13] 2023-03-17 11:55:34,102 > InboundConnectionInitiator.java:567 - > /127.0.0.2:7000(/127.0.0.2:49764)->/127.0.0.2:7000-URGENT_MESSAGES-f887f6fa > messaging connection established, version = 13, framing = CRC, encryption = > unencrypted > INFO [Messaging-EventLoop-3-11] 2023-03-17 11:55:34,102 > OutboundConnection.java:1164 - > /127.0.0.2:7000(/127.0.0.2:49764)->/127.0.0.2:7000-URGENT_MESSAGES-5cd0c637 > successfully connected, version = 13, framing = CRC, encryption = unencrypted > ERROR [InternalResponseStage:4] 2023-03-17 11:55:49,237 > RemoteProcessor.java:164 - Got error from /127.0.0.1:7000: TIMEOUT when > sending TCM_REPLAY_REQ, retrying on > CandidateIterator{candidates=[/127.0.0.2:7000, /127.0.0.1:7000, > /127.0.0.2:7000, / > 127.0.0.3:7000, /127.0.0.1:7000], checkLive=false} > WARN [InternalMetadataStage:8] 2023-03-17 11:55:49,394 NoSpamLogger.java:108 > - Not currently a member of the CMS > WARN [Messaging-EventLoop-3-1] 2023-03-17 11:56:04,636 NoSpamLogger.java:108 > - /127.0.0.2:7000->/127.0.0.1:7000-SMALL_MESSAGES-[no-channel] dropping > message of type TCM_REPLAY_REQ whose timeout expired before reaching the > network > ERROR [InternalResponseStage:5] 2023-03-17 11:56:04,637 > RemoteProcessor.java:164 - Got error from /127.0.0.1:7000: TIMEOUT when > sending TCM_REPLAY_REQ, retrying on > CandidateIterator{candidates=[/127.0.0.2:7000, /127.0.0.3:7000, > /127.0.0.1:7000, / > 127.0.0.2:7000, /127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.3:7000, > /127.0.0.1:7000], checkLive=false} > WARN [InternalMetadataStage:11] 2023-03-17 11:56:04,892 > NoSpamLogger.java:108 - Not currently a member of the CMS > ... > ERROR [InternalResponseStage:6] 2023-03-17 11:56:20,335 > RemoteProcessor.java:164 - Got error from /127.0.0.1:7000: TIMEOUT when > sending TCM_REPLAY_REQ, retrying on > CandidateIterator{candidates=[/127.0.0.2:7000, /127.0.0.1:7000], > checkLive=false} > WARN [InternalMetadataStage:14] 2023-03-17 11:56:20,391 > NoSpamLogger.java:108 - Not currently a member of the CMS > ERROR [InternalResponseStage:7] 2023-03-17 11:56:21,750 > RemoteProcessor.java:164 - Got error from /127.0.0.3:7000: TIMEOUT when > sending TCM_REPLAY_REQ, retrying on > CandidateIterator{candidates=[/127.0.0.1:7000, /127.0.0.2:7000, > /127.0.0.1:7000, / > 127.0.0.2:7000, /127.0.0.3:7000, /127.0.0.1:7000, /127.0.0.2:7000, > /127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.3:7000, /127.0.0.3:7000], > checkLive=false} > WARN [Messaging-EventLoop-3-1] 2023-03-17 11:56:35,535 NoSpamLogger.java:108 > - /127.0.0.2:7000->/127.0.0.1:7000-SMALL_MESSAGES-[no-channel] dropping > message of type TCM_REPLAY_REQ whose timeout expired before reaching the > network > ERROR [InternalResponseStage:8] 2023-03-17 11:56:35,537 > RemoteProcessor.java:164 - Got error from /127.0.0.1:7000: TIMEOUT when > sending TCM_REPLAY_REQ, retrying on > CandidateIterator{candidates=[/127.0.0.2:7000, /127.0.0.1:7000, > /127.0.0.2:7000, / > 127.0.0.3:7000, /127.0.0.1:7000], checkLive=false} > WARN [InternalMetadataStage:17] 2023-03-17 11:56:35,693 > NoSpamLogger.java:108 - Not currently a member of the CMS > ERROR [InternalResponseStage:9] 2023-03-17 11:56:37,135 > RemoteProcessor.java:164 - Got error from /127.0.0.1:7000: TIMEOUT when > sending TCM_REPLAY_REQ, retrying on > CandidateIterator{candidates=[/127.0.0.2:7000, /127.0.0.1:7000, > /127.0.0.2:7000, / > 127.0.0.3:7000, /127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.1:7000, > /127.0.0.2:7000, /127.0.0.3:7000, /127.0.0.3:7000, /127.0.0.1:7000], > checkLive=false} > WARN [InternalMetadataStage:20] 2023-03-17 11:56:37,540 > NoSpamLogger.java:108 - Not currently a member of the CMS > ERROR [InternalResponseStage:10] 2023-03-17 11:56:50,935 > RemoteProcessor.java:164 - Got error from /127.0.0.1:7000: TIMEOUT when > sending TCM_REPLAY_REQ, retrying on > CandidateIterator{candidates=[/127.0.0.2:7000, /127.0.0.3:7000, > /127.0.0.1:7000, > /127.0.0.2:7000, /127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.3:7000, > /127.0.0.1:7000], checkLive=false} > WARN [InternalMetadataStage:23] 2023-03-17 11:56:51,191 > NoSpamLogger.java:108 - Not currently a member of the CMS > {noformat} > ...and ends here: > {noformat} > ERROR [InternalResponseStage:11] 2023-03-17 11:56:53,036 > RemoteProcessor.java:164 - Got error from /127.0.0.1:7000: TIMEOUT when > sending TCM_REPLAY_REQ, retrying on > CandidateIterator{candidates=[/127.0.0.2:7000, /127.0.0.3:7000, > /127.0.0.1:7000, > /127.0.0.2:7000, /127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.3:7000, > /127.0.0.3:7000, /127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.1:7000, > /127.0.0.2:7000, /127.0.0.3:7000, /127.0.0.1:7000], checkLive=false} > Exception (java.lang.IllegalStateException) encountered during startup: Could > not succeed sending TCM_REPLAY_REQ to > CandidateIterator{candidates=[/127.0.0.2:7000, /127.0.0.3:7000, > /127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.1:7000, /127.0.0.2:7000 > , /127.0.0.3:7000, /127.0.0.3:7000, /127.0.0.1:7000, /127.0.0.2:7000, > /127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.3:7000, /127.0.0.1:7000], > checkLive=false} after 10 tries > ERROR [main] 2023-03-17 11:56:53,546 CassandraDaemon.java:929 - Exception > encountered during startup > java.lang.IllegalStateException: Could not succeed sending TCM_REPLAY_REQ to > CandidateIterator{candidates=[/127.0.0.2:7000, /127.0.0.3:7000, > /127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.1:7000, /127.0.0.2:7000, > /127.0.0.3:7000, /127.0.0.3:7000, /12 > 7.0.0.1:7000, /127.0.0.2:7000, /127.0.0.1:7000, /127.0.0.2:7000, > /127.0.0.3:7000, /127.0.0.1:7000], checkLive=false} after 10 tries > at > org.apache.cassandra.tcm.RemoteProcessor.sendWithCallback(RemoteProcessor.java:181) > at > org.apache.cassandra.tcm.RemoteProcessor.replayAndWait(RemoteProcessor.java:118) > at > org.apache.cassandra.tcm.ClusterMetadataService$SwitchableProcessor.replayAndWait(ClusterMetadataService.java:577) > at > org.apache.cassandra.tcm.Startup.initializeForDiscovery(Startup.java:149) > at org.apache.cassandra.tcm.Startup.initialize(Startup.java:84) > at org.apache.cassandra.tcm.Startup.initialize(Startup.java:59) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:267) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:777) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:907) > ... > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org