[ 
https://issues.apache.org/jira/browse/CASSANDRA-18660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17742701#comment-17742701
 ] 

Berenguer Blasi commented on CASSANDRA-18660:
---------------------------------------------

If you can't repro locally there's nothing else we can do. It still irks me the 
others didn't notice bc as you say the logs seem correct. If there is some ccm 
bug we'll hear about it. For the time being this is the best we can do agreed. 
+1

> transient_replication_ring_test.py::TestTransientReplicationRing::test_bootstrap_and_cleanup
> --------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-18660
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18660
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Test/dtest/python
>            Reporter: Andres de la Peña
>            Assignee: Brandon Williams
>            Priority: Normal
>             Fix For: 5.x
>
>
> {{transient_replication_ring_test.py::TestTransientReplicationRing::test_bootstrap_and_cleanup}}
>  is flaky at least in CircleCI and {{trunk}}:
> https://app.circleci.com/pipelines/github/adelapena/cassandra/3018/workflows/2c81b63f-63b8-4baf-8945-d3c680edede6/jobs/58482/tests
> {code}
> ccmlib.node.TimeoutError: 11 Jul 2023 13:59:08 [node3] after 120.11/120 
> seconds Missing: ['127.0.0.2:7000.* is now UP'] not found in system.log:
>  Head: INFO  [Messaging-EventLoop-3-6] 2023-07-11 13:57:0
>  Tail: 
> ...7.0.0.3:7000(/127.0.0.1:56442)->/127.0.0.2:7000-LARGE_MESSAGES-da6108fe 
> successfully connected, version = 12, framing = CRC, encryption = unencrypted
> self = <transient_replication_ring_test.TestTransientReplicationRing object 
> at 0x7f2f340c2c88>
>     @flaky(max_runs=1)
>     @pytest.mark.no_vnodes
>     def test_bootstrap_and_cleanup(self):
>         """Test bootstrapping a new node across a mix of repaired and 
> unrepaired data"""
>         main_session = self.patient_cql_connection(self.node1)
>         nodes = [self.node1, self.node2, self.node3]
>     
>         for i in range(0, 40, 2):
>             self.insert_row(i, i, i, main_session)
>     
>         sessions = [self.exclusive_cql_connection(node) for node in 
> [self.node1, self.node2, self.node3]]
>     
>         expected = [gen_expected(range(0, 11, 2), range(22, 40, 2)),
>                     gen_expected(range(0, 22, 2), range(32, 40, 2)),
>                     gen_expected(range(12, 31, 2))]
>         self.check_expected(sessions, expected)
>     
>         # Make sure at least a little data is repaired, this shouldn't move 
> data anywhere
>         repair_nodes(nodes)
>     
>         self.check_expected(sessions, expected)
>     
>         # Ensure that there is at least some transient data around, because 
> of this if it's missing after bootstrap
>         # We know we failed to get it from the transient replica losing the 
> range entirely
>         nodes[1].stop(wait_other_notice=True)
>     
>         for i in range(1, 40, 2):
>             self.insert_row(i, i, i, main_session)
>     
> >       nodes[1].start(wait_for_binary_proto=True)
> transient_replication_ring_test.py:184: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> transient_replication_ring_test.py:52: in new_start
>     return old_start(*args, **kwargs)
> ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:896: in start
>     node.watch_log_for_alive(self, from_mark=mark)
> ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:665: in 
> watch_log_for_alive
>     self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, 
> filename=filename)
> ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:593: in watch_log_for
>     head=reads[:50], tail="..."+reads[len(reads)-150:]))
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> start = 1689083828.8439646, timeout = 120
> msg = "Missing: ['127.0.0.2:7000.* is now UP'] not found in system.log:\n 
> Head: INFO  [Messaging-EventLoop-3-6] 2023-07-11 
> 1...127.0.0.2:7000-LARGE_MESSAGES-da6108fe successfully connected, version = 
> 12, framing = CRC, encryption = unencrypted\n"
> node = 'node3'
>     @staticmethod
>     def raise_if_passed(start, timeout, msg, node=None):
>         if start + timeout < time.time():
> >           raise TimeoutError.create(start, timeout, msg, node)
> E           ccmlib.node.TimeoutError: 11 Jul 2023 13:59:08 [node3] after 
> 120.11/120 seconds Missing: ['127.0.0.2:7000.* is now UP'] not found in 
> system.log:
> E            Head: INFO  [Messaging-EventLoop-3-6] 2023-07-11 13:57:0
> E            Tail: 
> ...7.0.0.3:7000(/127.0.0.1:56442)->/127.0.0.2:7000-LARGE_MESSAGES-da6108fe 
> successfully connected, version = 12, framing = CRC, encryption = unencrypted
> ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:56: TimeoutError
> {code}
> The CircleCI config to reproduce can be generated with:
> {code}
> .circleci/generate.sh -p \
>   -e REPEATED_DTESTS_COUNT=500 \
>   -e 
> REPEATED_DTESTS=transient_replication_ring_test.py::TestTransientReplicationRing::test_bootstrap_and_cleanup
> {code}
> Flakiness seems ~1%



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to