[ 
https://issues.apache.org/jira/browse/CASSANDRA-18347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-18347:
----------------------------------------
    Resolution: Not A Problem
        Status: Resolved  (was: Open)

> CEP-21: Startup failures in Python dtests around TCM_REPLAY_REQ
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-18347
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18347
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Membership, Cluster/Schema
>            Reporter: Caleb Rackliffe
>            Priority: Normal
>             Fix For: NA
>
>
> There are currently widespread, locally reproducible failures in the Python 
> dtests against the {{cep-21-tcm}} branch. For example...
>  
> {noformat}pytest --cassandra-dir=/Users/maedhroz/Forks/cassandra 
> topology_test.py::TestTopology::test_decommissioned_node_cant_rejoin{noformat}
> {noformat}pytest --cassandra-dir=/Users/maedhroz/Forks/cassandra 
> materialized_views_test.py::TestMaterializedViews::test_query_new_column{noformat}
> {noformat}pytest --cassandra-dir=/Users/maedhroz/Forks/cassandra 
> read_repair_test.py::TestSpeculativeReadRepair::test_normal_read_repair{noformat}
> https://app.circleci.com/pipelines/github/maedhroz/cassandra/701/workflows/44a5c7e0-0de0-4839-bbd0-80771fe23843/jobs/7251
> https://app.circleci.com/pipelines/github/beobal/cassandra/406/workflows/00cdb02e-4b3e-477a-b997-403121172249/jobs/4204/tests
> The death spiral in the node startup logs starts like this…
> {noformat}
> WARN  [Messaging-EventLoop-3-1] 2023-03-17 11:55:34,037 NoSpamLogger.java:108 
> - /127.0.0.2:7000->/127.0.0.1:7000-SMALL_MESSAGES-[no-channel] dropping 
> message of type TCM_REPLAY_REQ whose timeout expired before reaching the 
> network
> ERROR [InternalResponseStage:3] 2023-03-17 11:55:34,038 
> RemoteProcessor.java:164 - Got error from /127.0.0.1:7000: TIMEOUT when 
> sending TCM_REPLAY_REQ, retrying on 
> CandidateIterator{candidates=[/127.0.0.2:7000, /127.0.0.1:7000], 
> checkLive=false}
> INFO  [Messaging-EventLoop-3-12] 2023-03-17 11:55:34,099 
> InboundConnectionInitiator.java:567 - 
> /127.0.0.2:7000(/127.0.0.2:49763)->/127.0.0.2:7000-SMALL_MESSAGES-1b9301b6 
> messaging connection established, version = 13, framing = CRC, encryption =
> unencrypted
> INFO  [Messaging-EventLoop-3-9] 2023-03-17 11:55:34,099 
> OutboundConnection.java:1164 - 
> /127.0.0.2:7000(/127.0.0.2:49763)->/127.0.0.2:7000-SMALL_MESSAGES-a9302b2e 
> successfully connected, version = 13, framing = CRC, encryption = unencrypted
> WARN  [InternalMetadataStage:5] 2023-03-17 11:55:34,100 NoSpamLogger.java:108 
> - Not currently a member of the CMS
> INFO  [Messaging-EventLoop-3-13] 2023-03-17 11:55:34,102 
> InboundConnectionInitiator.java:567 - 
> /127.0.0.2:7000(/127.0.0.2:49764)->/127.0.0.2:7000-URGENT_MESSAGES-f887f6fa 
> messaging connection established, version = 13, framing = CRC, encryption =
>  unencrypted
> INFO  [Messaging-EventLoop-3-11] 2023-03-17 11:55:34,102 
> OutboundConnection.java:1164 - 
> /127.0.0.2:7000(/127.0.0.2:49764)->/127.0.0.2:7000-URGENT_MESSAGES-5cd0c637 
> successfully connected, version = 13, framing = CRC, encryption = unencrypted
> ERROR [InternalResponseStage:4] 2023-03-17 11:55:49,237 
> RemoteProcessor.java:164 - Got error from /127.0.0.1:7000: TIMEOUT when 
> sending TCM_REPLAY_REQ, retrying on 
> CandidateIterator{candidates=[/127.0.0.2:7000, /127.0.0.1:7000, 
> /127.0.0.2:7000, /
> 127.0.0.3:7000, /127.0.0.1:7000], checkLive=false}
> WARN  [InternalMetadataStage:8] 2023-03-17 11:55:49,394 NoSpamLogger.java:108 
> - Not currently a member of the CMS
> WARN  [Messaging-EventLoop-3-1] 2023-03-17 11:56:04,636 NoSpamLogger.java:108 
> - /127.0.0.2:7000->/127.0.0.1:7000-SMALL_MESSAGES-[no-channel] dropping 
> message of type TCM_REPLAY_REQ whose timeout expired before reaching the 
> network
> ERROR [InternalResponseStage:5] 2023-03-17 11:56:04,637 
> RemoteProcessor.java:164 - Got error from /127.0.0.1:7000: TIMEOUT when 
> sending TCM_REPLAY_REQ, retrying on 
> CandidateIterator{candidates=[/127.0.0.2:7000, /127.0.0.3:7000, 
> /127.0.0.1:7000, /
> 127.0.0.2:7000, /127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.3:7000, 
> /127.0.0.1:7000], checkLive=false}
> WARN  [InternalMetadataStage:11] 2023-03-17 11:56:04,892 
> NoSpamLogger.java:108 - Not currently a member of the CMS
> ...
> ERROR [InternalResponseStage:6] 2023-03-17 11:56:20,335 
> RemoteProcessor.java:164 - Got error from /127.0.0.1:7000: TIMEOUT when 
> sending TCM_REPLAY_REQ, retrying on 
> CandidateIterator{candidates=[/127.0.0.2:7000, /127.0.0.1:7000], 
> checkLive=false}
> WARN  [InternalMetadataStage:14] 2023-03-17 11:56:20,391 
> NoSpamLogger.java:108 - Not currently a member of the CMS
> ERROR [InternalResponseStage:7] 2023-03-17 11:56:21,750 
> RemoteProcessor.java:164 - Got error from /127.0.0.3:7000: TIMEOUT when 
> sending TCM_REPLAY_REQ, retrying on 
> CandidateIterator{candidates=[/127.0.0.1:7000, /127.0.0.2:7000, 
> /127.0.0.1:7000, /
> 127.0.0.2:7000, /127.0.0.3:7000, /127.0.0.1:7000, /127.0.0.2:7000, 
> /127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.3:7000, /127.0.0.3:7000], 
> checkLive=false}
> WARN  [Messaging-EventLoop-3-1] 2023-03-17 11:56:35,535 NoSpamLogger.java:108 
> - /127.0.0.2:7000->/127.0.0.1:7000-SMALL_MESSAGES-[no-channel] dropping 
> message of type TCM_REPLAY_REQ whose timeout expired before reaching the 
> network
> ERROR [InternalResponseStage:8] 2023-03-17 11:56:35,537 
> RemoteProcessor.java:164 - Got error from /127.0.0.1:7000: TIMEOUT when 
> sending TCM_REPLAY_REQ, retrying on 
> CandidateIterator{candidates=[/127.0.0.2:7000, /127.0.0.1:7000, 
> /127.0.0.2:7000, /
> 127.0.0.3:7000, /127.0.0.1:7000], checkLive=false}
> WARN  [InternalMetadataStage:17] 2023-03-17 11:56:35,693 
> NoSpamLogger.java:108 - Not currently a member of the CMS
> ERROR [InternalResponseStage:9] 2023-03-17 11:56:37,135 
> RemoteProcessor.java:164 - Got error from /127.0.0.1:7000: TIMEOUT when 
> sending TCM_REPLAY_REQ, retrying on 
> CandidateIterator{candidates=[/127.0.0.2:7000, /127.0.0.1:7000, 
> /127.0.0.2:7000, /
> 127.0.0.3:7000, /127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.1:7000, 
> /127.0.0.2:7000, /127.0.0.3:7000, /127.0.0.3:7000, /127.0.0.1:7000], 
> checkLive=false}
> WARN  [InternalMetadataStage:20] 2023-03-17 11:56:37,540 
> NoSpamLogger.java:108 - Not currently a member of the CMS
> ERROR [InternalResponseStage:10] 2023-03-17 11:56:50,935 
> RemoteProcessor.java:164 - Got error from /127.0.0.1:7000: TIMEOUT when 
> sending TCM_REPLAY_REQ, retrying on 
> CandidateIterator{candidates=[/127.0.0.2:7000, /127.0.0.3:7000, 
> /127.0.0.1:7000,
> /127.0.0.2:7000, /127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.3:7000, 
> /127.0.0.1:7000], checkLive=false}
> WARN  [InternalMetadataStage:23] 2023-03-17 11:56:51,191 
> NoSpamLogger.java:108 - Not currently a member of the CMS
> {noformat}
> ...and ends here:
> {noformat}
> ERROR [InternalResponseStage:11] 2023-03-17 11:56:53,036 
> RemoteProcessor.java:164 - Got error from /127.0.0.1:7000: TIMEOUT when 
> sending TCM_REPLAY_REQ, retrying on 
> CandidateIterator{candidates=[/127.0.0.2:7000, /127.0.0.3:7000, 
> /127.0.0.1:7000,
> /127.0.0.2:7000, /127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.3:7000, 
> /127.0.0.3:7000, /127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.1:7000, 
> /127.0.0.2:7000, /127.0.0.3:7000, /127.0.0.1:7000], checkLive=false}
> Exception (java.lang.IllegalStateException) encountered during startup: Could 
> not succeed sending TCM_REPLAY_REQ to 
> CandidateIterator{candidates=[/127.0.0.2:7000, /127.0.0.3:7000, 
> /127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.1:7000, /127.0.0.2:7000
> , /127.0.0.3:7000, /127.0.0.3:7000, /127.0.0.1:7000, /127.0.0.2:7000, 
> /127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.3:7000, /127.0.0.1:7000], 
> checkLive=false} after 10 tries
> ERROR [main] 2023-03-17 11:56:53,546 CassandraDaemon.java:929 - Exception 
> encountered during startup
> java.lang.IllegalStateException: Could not succeed sending TCM_REPLAY_REQ to 
> CandidateIterator{candidates=[/127.0.0.2:7000, /127.0.0.3:7000, 
> /127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.1:7000, /127.0.0.2:7000, 
> /127.0.0.3:7000, /127.0.0.3:7000, /12
> 7.0.0.1:7000, /127.0.0.2:7000, /127.0.0.1:7000, /127.0.0.2:7000, 
> /127.0.0.3:7000, /127.0.0.1:7000], checkLive=false} after 10 tries
>         at 
> org.apache.cassandra.tcm.RemoteProcessor.sendWithCallback(RemoteProcessor.java:181)
>         at 
> org.apache.cassandra.tcm.RemoteProcessor.replayAndWait(RemoteProcessor.java:118)
>         at 
> org.apache.cassandra.tcm.ClusterMetadataService$SwitchableProcessor.replayAndWait(ClusterMetadataService.java:577)
>         at 
> org.apache.cassandra.tcm.Startup.initializeForDiscovery(Startup.java:149)
>         at org.apache.cassandra.tcm.Startup.initialize(Startup.java:84)
>         at org.apache.cassandra.tcm.Startup.initialize(Startup.java:59)
>         at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:267)
>         at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:777)
>         at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:907)
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to