[jira] [Commented] (CASSANDRA-16644) Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346172#comment-17346172 ] Ekaterina Dimitrova commented on CASSANDRA-16644: - Absolutely, good catch! Thank you! > Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup > -- > > Key: CASSANDRA-16644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16644 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0, 4.0-rc, 4.x > > > Failure > [here|https://ci-cassandra.apache.org/job/Cassandra-trunk/472/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_backwards_and_cleanup/] > {noformat} > Error Message > ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds > Missing: ['Starting listening for CQL clients'] not found in system.log: > Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > Stacktrace > self = at 0x7f66c51ebd90> > @flaky(max_runs=1) > @pytest.mark.no_vnodes > def test_move_backwards_and_cleanup(self): > """Test moving a node backwards without moving past a neighbor > token""" > move_token = '5' > expected_after_move = [gen_expected(range(0, 6), range(31, 40)), >gen_expected(range(0, 21, 2)), >gen_expected(range(1, 6, 2), range(6, 31)), >gen_expected(range(7, 20, 2), range(21, 40))] > expected_after_repair = [gen_expected(range(0, 6), range(31, 40)), > gen_expected(range(0, 21)), > gen_expected(range(6, 31)), > gen_expected(range(21, 40))] > > self.move_test(move_token, expected_after_move, expected_after_repair) > transient_replication_ring_test.py:333: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > transient_replication_ring_test.py:235: in move_test > node4.start(wait_for_binary_proto=True) > transient_replication_ring_test.py:50: in new_start > return old_start(*args, **kwargs) > ../venv/src/ccm/ccmlib/node.py:901: in start > self.wait_for_binary_interface(from_mark=self.mark) > ../venv/src/ccm/ccmlib/node.py:687: in wait_for_binary_interface > self.watch_log_for("Starting listening for CQL clients", **kwargs) > ../venv/src/ccm/ccmlib/node.py:588: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1619481570.0236456, timeout = 90 > msg = "Missing: ['Starting listening for CQL clients'] not found in > system.log:\nTail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura" > node = 'node4' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after > 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in > system.log: > E Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > ../venv/src/ccm/ccmlib/node.py:56: TimeoutError > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16644) Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17345987#comment-17345987 ] Berenguer Blasi commented on CASSANDRA-16644: - [~e.dimitrova] I committed the dtest change as that is independent of the ccm change. The ccm change I noticed there are changes in ccm master not tagged as cassandra-test yet. Hence I am gathering info on whether this is intentional or not. Once I know that I will commit the ccm change, I hope that is ok with you but I am being extra cautious here. > Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup > -- > > Key: CASSANDRA-16644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16644 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-rc > > > Failure > [here|https://ci-cassandra.apache.org/job/Cassandra-trunk/472/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_backwards_and_cleanup/] > {noformat} > Error Message > ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds > Missing: ['Starting listening for CQL clients'] not found in system.log: > Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > Stacktrace > self = at 0x7f66c51ebd90> > @flaky(max_runs=1) > @pytest.mark.no_vnodes > def test_move_backwards_and_cleanup(self): > """Test moving a node backwards without moving past a neighbor > token""" > move_token = '5' > expected_after_move = [gen_expected(range(0, 6), range(31, 40)), >gen_expected(range(0, 21, 2)), >gen_expected(range(1, 6, 2), range(6, 31)), >gen_expected(range(7, 20, 2), range(21, 40))] > expected_after_repair = [gen_expected(range(0, 6), range(31, 40)), > gen_expected(range(0, 21)), > gen_expected(range(6, 31)), > gen_expected(range(21, 40))] > > self.move_test(move_token, expected_after_move, expected_after_repair) > transient_replication_ring_test.py:333: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > transient_replication_ring_test.py:235: in move_test > node4.start(wait_for_binary_proto=True) > transient_replication_ring_test.py:50: in new_start > return old_start(*args, **kwargs) > ../venv/src/ccm/ccmlib/node.py:901: in start > self.wait_for_binary_interface(from_mark=self.mark) > ../venv/src/ccm/ccmlib/node.py:687: in wait_for_binary_interface > self.watch_log_for("Starting listening for CQL clients", **kwargs) > ../venv/src/ccm/ccmlib/node.py:588: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1619481570.0236456, timeout = 90 > msg = "Missing: ['Starting listening for CQL clients'] not found in > system.log:\nTail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura" > node = 'node4' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after > 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in > system.log: > E Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > ../venv/src/ccm/ccmlib/node.py:56: TimeoutError > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16644) Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344567#comment-17344567 ] Ekaterina Dimitrova commented on CASSANDRA-16644: - When I mentioned the long tests, I didn't mean that this is the case here but I wanted to give an example of another class where there was different solution to timeout problems and we should verify those we see now why they appear and start from there. Thank you, everything looks good to me! > Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup > -- > > Key: CASSANDRA-16644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16644 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-rc > > > Failure > [here|https://ci-cassandra.apache.org/job/Cassandra-trunk/472/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_backwards_and_cleanup/] > {noformat} > Error Message > ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds > Missing: ['Starting listening for CQL clients'] not found in system.log: > Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > Stacktrace > self = at 0x7f66c51ebd90> > @flaky(max_runs=1) > @pytest.mark.no_vnodes > def test_move_backwards_and_cleanup(self): > """Test moving a node backwards without moving past a neighbor > token""" > move_token = '5' > expected_after_move = [gen_expected(range(0, 6), range(31, 40)), >gen_expected(range(0, 21, 2)), >gen_expected(range(1, 6, 2), range(6, 31)), >gen_expected(range(7, 20, 2), range(21, 40))] > expected_after_repair = [gen_expected(range(0, 6), range(31, 40)), > gen_expected(range(0, 21)), > gen_expected(range(6, 31)), > gen_expected(range(21, 40))] > > self.move_test(move_token, expected_after_move, expected_after_repair) > transient_replication_ring_test.py:333: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > transient_replication_ring_test.py:235: in move_test > node4.start(wait_for_binary_proto=True) > transient_replication_ring_test.py:50: in new_start > return old_start(*args, **kwargs) > ../venv/src/ccm/ccmlib/node.py:901: in start > self.wait_for_binary_interface(from_mark=self.mark) > ../venv/src/ccm/ccmlib/node.py:687: in wait_for_binary_interface > self.watch_log_for("Starting listening for CQL clients", **kwargs) > ../venv/src/ccm/ccmlib/node.py:588: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1619481570.0236456, timeout = 90 > msg = "Missing: ['Starting listening for CQL clients'] not found in > system.log:\nTail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura" > node = 'node4' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after > 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in > system.log: > E Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > ../venv/src/ccm/ccmlib/node.py:56: TimeoutError > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16644) Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344383#comment-17344383 ] Berenguer Blasi commented on CASSANDRA-16644: - I did a last CI run and it [lgtm|https://app.circleci.com/pipelines/github/bereng/cassandra/287/workflows/7d1e15b7-9d2d-4347-80ee-ab8cb1166155/jobs/2723] Timeouts imo are not related to having expanded the test so now it better fits into some other 'long' category as it was your case. I just think we're close to the timeouts but time will tell. We don't have to decide that now. If you're +1 I'll revert the requirements.txt change and then we can commit. > Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup > -- > > Key: CASSANDRA-16644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16644 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-rc > > > Failure > [here|https://ci-cassandra.apache.org/job/Cassandra-trunk/472/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_backwards_and_cleanup/] > {noformat} > Error Message > ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds > Missing: ['Starting listening for CQL clients'] not found in system.log: > Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > Stacktrace > self = at 0x7f66c51ebd90> > @flaky(max_runs=1) > @pytest.mark.no_vnodes > def test_move_backwards_and_cleanup(self): > """Test moving a node backwards without moving past a neighbor > token""" > move_token = '5' > expected_after_move = [gen_expected(range(0, 6), range(31, 40)), >gen_expected(range(0, 21, 2)), >gen_expected(range(1, 6, 2), range(6, 31)), >gen_expected(range(7, 20, 2), range(21, 40))] > expected_after_repair = [gen_expected(range(0, 6), range(31, 40)), > gen_expected(range(0, 21)), > gen_expected(range(6, 31)), > gen_expected(range(21, 40))] > > self.move_test(move_token, expected_after_move, expected_after_repair) > transient_replication_ring_test.py:333: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > transient_replication_ring_test.py:235: in move_test > node4.start(wait_for_binary_proto=True) > transient_replication_ring_test.py:50: in new_start > return old_start(*args, **kwargs) > ../venv/src/ccm/ccmlib/node.py:901: in start > self.wait_for_binary_interface(from_mark=self.mark) > ../venv/src/ccm/ccmlib/node.py:687: in wait_for_binary_interface > self.watch_log_for("Starting listening for CQL clients", **kwargs) > ../venv/src/ccm/ccmlib/node.py:588: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1619481570.0236456, timeout = 90 > msg = "Missing: ['Starting listening for CQL clients'] not found in > system.log:\nTail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura" > node = 'node4' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after > 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in > system.log: > E Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > ../venv/src/ccm/ccmlib/node.py:56: TimeoutError > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16644) Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343844#comment-17343844 ] Ekaterina Dimitrova commented on CASSANDRA-16644: - Assuming you tested 16667 with the new ccm logging and you see the same issue, I am +1. Let's run the multiplexer again for both tests to confirm the issue is solved and commit. Thank you! {quote}This lead me to speculate we probably have low generic timeouts now for jenkins hence my proposal to raise them. {quote} I suggest we open an umbrella ticket to look at those classes/tests. In another ticket the solution was to move two test classes to long run tests. While I see what you mean, the number of failures we see is still just a few tests compared and not enough justification for me to raise the time for the thousands of tests we have which will lead to significant CI run time increase in case of failures. > Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup > -- > > Key: CASSANDRA-16644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16644 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-rc > > > Failure > [here|https://ci-cassandra.apache.org/job/Cassandra-trunk/472/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_backwards_and_cleanup/] > {noformat} > Error Message > ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds > Missing: ['Starting listening for CQL clients'] not found in system.log: > Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > Stacktrace > self = at 0x7f66c51ebd90> > @flaky(max_runs=1) > @pytest.mark.no_vnodes > def test_move_backwards_and_cleanup(self): > """Test moving a node backwards without moving past a neighbor > token""" > move_token = '5' > expected_after_move = [gen_expected(range(0, 6), range(31, 40)), >gen_expected(range(0, 21, 2)), >gen_expected(range(1, 6, 2), range(6, 31)), >gen_expected(range(7, 20, 2), range(21, 40))] > expected_after_repair = [gen_expected(range(0, 6), range(31, 40)), > gen_expected(range(0, 21)), > gen_expected(range(6, 31)), > gen_expected(range(21, 40))] > > self.move_test(move_token, expected_after_move, expected_after_repair) > transient_replication_ring_test.py:333: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > transient_replication_ring_test.py:235: in move_test > node4.start(wait_for_binary_proto=True) > transient_replication_ring_test.py:50: in new_start > return old_start(*args, **kwargs) > ../venv/src/ccm/ccmlib/node.py:901: in start > self.wait_for_binary_interface(from_mark=self.mark) > ../venv/src/ccm/ccmlib/node.py:687: in wait_for_binary_interface > self.watch_log_for("Starting listening for CQL clients", **kwargs) > ../venv/src/ccm/ccmlib/node.py:588: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1619481570.0236456, timeout = 90 > msg = "Missing: ['Starting listening for CQL clients'] not found in > system.log:\nTail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura" > node = 'node4' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after > 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in > system.log: > E Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > ../venv/src/ccm/ccmlib/node.py:56: TimeoutError > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16644) Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343713#comment-17343713 ] Berenguer Blasi commented on CASSANDRA-16644: - [~e.dimitrova] so if you peruse the latest runs you will notice [here|https://ci-cassandra.apache.org/job/Cassandra-4.0/38/testReport/junit/dtest-offheap.repair_tests.incremental_repair_test/TestIncRepair/test_multiple_repair/] and [here|https://ci-cassandra.apache.org/job/Cassandra-4.0/38/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_forwards_and_cleanup/] dtests timeouts on common operations like cql queries or node/cluster starts. Some for this test class but different test method, some for other tests. This lead me to speculate we probably have low generic timeouts now for jenkins hence my proposal to raise them. Of course we can do as you suggested and go one by one instead. I.e. I included a fix for CASSANDRA-16667 here for convinience. So: - I updated the [PR|https://github.com/apache/cassandra-dtest/pull/135] to it's final version + the 16667 fix - CCM [PR|https://github.com/ekaterinadimitrova2/ccm/commit/60515b59cc59dd4ac500321f6b6b3a8818ed7f88] stays the same - If you +1 I run a final CI and commit. > Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup > -- > > Key: CASSANDRA-16644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16644 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-rc > > > Failure > [here|https://ci-cassandra.apache.org/job/Cassandra-trunk/472/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_backwards_and_cleanup/] > {noformat} > Error Message > ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds > Missing: ['Starting listening for CQL clients'] not found in system.log: > Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > Stacktrace > self = at 0x7f66c51ebd90> > @flaky(max_runs=1) > @pytest.mark.no_vnodes > def test_move_backwards_and_cleanup(self): > """Test moving a node backwards without moving past a neighbor > token""" > move_token = '5' > expected_after_move = [gen_expected(range(0, 6), range(31, 40)), >gen_expected(range(0, 21, 2)), >gen_expected(range(1, 6, 2), range(6, 31)), >gen_expected(range(7, 20, 2), range(21, 40))] > expected_after_repair = [gen_expected(range(0, 6), range(31, 40)), > gen_expected(range(0, 21)), > gen_expected(range(6, 31)), > gen_expected(range(21, 40))] > > self.move_test(move_token, expected_after_move, expected_after_repair) > transient_replication_ring_test.py:333: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > transient_replication_ring_test.py:235: in move_test > node4.start(wait_for_binary_proto=True) > transient_replication_ring_test.py:50: in new_start > return old_start(*args, **kwargs) > ../venv/src/ccm/ccmlib/node.py:901: in start > self.wait_for_binary_interface(from_mark=self.mark) > ../venv/src/ccm/ccmlib/node.py:687: in wait_for_binary_interface > self.watch_log_for("Starting listening for CQL clients", **kwargs) > ../venv/src/ccm/ccmlib/node.py:588: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1619481570.0236456, timeout = 90 > msg = "Missing: ['Starting listening for CQL clients'] not found in > system.log:\nTail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura" > node = 'node4' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after > 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in > system.log: > E Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > ../venv/src/ccm/ccmlib/node.py:56: TimeoutError > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16644) Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343286#comment-17343286 ] Ekaterina Dimitrova commented on CASSANDRA-16644: - Bq. _The requirements.txt change I already mentioned it goes out. Good._ My bad, I missed it, apologize Bq. _On the timeout issue they seem to be coming up a bit more now. But we can go one at a time. Good._ The link points to a failure of this test but I looked at the last 5 runs in Jenkins and there are less than 10 failures per run, some of them already known. Did I miss something else you wanted to point out to me? Just double checking myself. > Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup > -- > > Key: CASSANDRA-16644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16644 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-rc > > > Failure > [here|https://ci-cassandra.apache.org/job/Cassandra-trunk/472/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_backwards_and_cleanup/] > {noformat} > Error Message > ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds > Missing: ['Starting listening for CQL clients'] not found in system.log: > Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > Stacktrace > self = at 0x7f66c51ebd90> > @flaky(max_runs=1) > @pytest.mark.no_vnodes > def test_move_backwards_and_cleanup(self): > """Test moving a node backwards without moving past a neighbor > token""" > move_token = '5' > expected_after_move = [gen_expected(range(0, 6), range(31, 40)), >gen_expected(range(0, 21, 2)), >gen_expected(range(1, 6, 2), range(6, 31)), >gen_expected(range(7, 20, 2), range(21, 40))] > expected_after_repair = [gen_expected(range(0, 6), range(31, 40)), > gen_expected(range(0, 21)), > gen_expected(range(6, 31)), > gen_expected(range(21, 40))] > > self.move_test(move_token, expected_after_move, expected_after_repair) > transient_replication_ring_test.py:333: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > transient_replication_ring_test.py:235: in move_test > node4.start(wait_for_binary_proto=True) > transient_replication_ring_test.py:50: in new_start > return old_start(*args, **kwargs) > ../venv/src/ccm/ccmlib/node.py:901: in start > self.wait_for_binary_interface(from_mark=self.mark) > ../venv/src/ccm/ccmlib/node.py:687: in wait_for_binary_interface > self.watch_log_for("Starting listening for CQL clients", **kwargs) > ../venv/src/ccm/ccmlib/node.py:588: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1619481570.0236456, timeout = 90 > msg = "Missing: ['Starting listening for CQL clients'] not found in > system.log:\nTail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura" > node = 'node4' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after > 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in > system.log: > E Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > ../venv/src/ccm/ccmlib/node.py:56: TimeoutError > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16644) Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343261#comment-17343261 ] Berenguer Blasi commented on CASSANDRA-16644: - The requirements.txt change I already mentioned it goes out. Good. On the timeout issue they seem to be coming up a bit more [now|https://ci-cassandra.apache.org/job/Cassandra-4.0/38/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_forwards_and_cleanup/]. But we can go one at a time. Good. Yep I can remove the other 'flaky' mentions in the class. Good > Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup > -- > > Key: CASSANDRA-16644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16644 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-rc > > > Failure > [here|https://ci-cassandra.apache.org/job/Cassandra-trunk/472/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_backwards_and_cleanup/] > {noformat} > Error Message > ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds > Missing: ['Starting listening for CQL clients'] not found in system.log: > Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > Stacktrace > self = at 0x7f66c51ebd90> > @flaky(max_runs=1) > @pytest.mark.no_vnodes > def test_move_backwards_and_cleanup(self): > """Test moving a node backwards without moving past a neighbor > token""" > move_token = '5' > expected_after_move = [gen_expected(range(0, 6), range(31, 40)), >gen_expected(range(0, 21, 2)), >gen_expected(range(1, 6, 2), range(6, 31)), >gen_expected(range(7, 20, 2), range(21, 40))] > expected_after_repair = [gen_expected(range(0, 6), range(31, 40)), > gen_expected(range(0, 21)), > gen_expected(range(6, 31)), > gen_expected(range(21, 40))] > > self.move_test(move_token, expected_after_move, expected_after_repair) > transient_replication_ring_test.py:333: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > transient_replication_ring_test.py:235: in move_test > node4.start(wait_for_binary_proto=True) > transient_replication_ring_test.py:50: in new_start > return old_start(*args, **kwargs) > ../venv/src/ccm/ccmlib/node.py:901: in start > self.wait_for_binary_interface(from_mark=self.mark) > ../venv/src/ccm/ccmlib/node.py:687: in wait_for_binary_interface > self.watch_log_for("Starting listening for CQL clients", **kwargs) > ../venv/src/ccm/ccmlib/node.py:588: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1619481570.0236456, timeout = 90 > msg = "Missing: ['Starting listening for CQL clients'] not found in > system.log:\nTail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura" > node = 'node4' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after > 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in > system.log: > E Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > ../venv/src/ccm/ccmlib/node.py:56: TimeoutError > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16644) Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343252#comment-17343252 ] Ekaterina Dimitrova commented on CASSANDRA-16644: - Bq. Can you please confirm we're talking about the same flaky and this is ready to commit with? I left a few comments in GitHub, most important, please, return the riptano/ccm repo. I used mine in requirements.txt only for test purposes. Thanks Bq. I am thinking of just generally raising for all tests those 90s to 120s or 180s. Wdyt? -1. Were have only a few new flakes and raising the timeout for the whole suite doesn't seem right to me. This will lead to even longer CI runs in case of failures. I suggest we first look at those few ones and the reason for flakiness and start from there. > Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup > -- > > Key: CASSANDRA-16644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16644 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-rc > > > Failure > [here|https://ci-cassandra.apache.org/job/Cassandra-trunk/472/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_backwards_and_cleanup/] > {noformat} > Error Message > ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds > Missing: ['Starting listening for CQL clients'] not found in system.log: > Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > Stacktrace > self = at 0x7f66c51ebd90> > @flaky(max_runs=1) > @pytest.mark.no_vnodes > def test_move_backwards_and_cleanup(self): > """Test moving a node backwards without moving past a neighbor > token""" > move_token = '5' > expected_after_move = [gen_expected(range(0, 6), range(31, 40)), >gen_expected(range(0, 21, 2)), >gen_expected(range(1, 6, 2), range(6, 31)), >gen_expected(range(7, 20, 2), range(21, 40))] > expected_after_repair = [gen_expected(range(0, 6), range(31, 40)), > gen_expected(range(0, 21)), > gen_expected(range(6, 31)), > gen_expected(range(21, 40))] > > self.move_test(move_token, expected_after_move, expected_after_repair) > transient_replication_ring_test.py:333: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > transient_replication_ring_test.py:235: in move_test > node4.start(wait_for_binary_proto=True) > transient_replication_ring_test.py:50: in new_start > return old_start(*args, **kwargs) > ../venv/src/ccm/ccmlib/node.py:901: in start > self.wait_for_binary_interface(from_mark=self.mark) > ../venv/src/ccm/ccmlib/node.py:687: in wait_for_binary_interface > self.watch_log_for("Starting listening for CQL clients", **kwargs) > ../venv/src/ccm/ccmlib/node.py:588: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1619481570.0236456, timeout = 90 > msg = "Missing: ['Starting listening for CQL clients'] not found in > system.log:\nTail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura" > node = 'node4' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after > 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in > system.log: > E Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > ../venv/src/ccm/ccmlib/node.py:56: TimeoutError > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16644) Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343123#comment-17343123 ] Berenguer Blasi commented on CASSANDRA-16644: - Looking at jenkins CI failures and logs today I have the impression other tests, maybe bc of the recent docker parallelization changes, are starting to hit timeouts. I am thinking of just generally raising for all tests those 90s to 120s or 180s. Wdyt? > Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup > -- > > Key: CASSANDRA-16644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16644 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-rc > > > Failure > [here|https://ci-cassandra.apache.org/job/Cassandra-trunk/472/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_backwards_and_cleanup/] > {noformat} > Error Message > ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds > Missing: ['Starting listening for CQL clients'] not found in system.log: > Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > Stacktrace > self = at 0x7f66c51ebd90> > @flaky(max_runs=1) > @pytest.mark.no_vnodes > def test_move_backwards_and_cleanup(self): > """Test moving a node backwards without moving past a neighbor > token""" > move_token = '5' > expected_after_move = [gen_expected(range(0, 6), range(31, 40)), >gen_expected(range(0, 21, 2)), >gen_expected(range(1, 6, 2), range(6, 31)), >gen_expected(range(7, 20, 2), range(21, 40))] > expected_after_repair = [gen_expected(range(0, 6), range(31, 40)), > gen_expected(range(0, 21)), > gen_expected(range(6, 31)), > gen_expected(range(21, 40))] > > self.move_test(move_token, expected_after_move, expected_after_repair) > transient_replication_ring_test.py:333: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > transient_replication_ring_test.py:235: in move_test > node4.start(wait_for_binary_proto=True) > transient_replication_ring_test.py:50: in new_start > return old_start(*args, **kwargs) > ../venv/src/ccm/ccmlib/node.py:901: in start > self.wait_for_binary_interface(from_mark=self.mark) > ../venv/src/ccm/ccmlib/node.py:687: in wait_for_binary_interface > self.watch_log_for("Starting listening for CQL clients", **kwargs) > ../venv/src/ccm/ccmlib/node.py:588: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1619481570.0236456, timeout = 90 > msg = "Missing: ['Starting listening for CQL clients'] not found in > system.log:\nTail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura" > node = 'node4' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after > 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in > system.log: > E Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > ../venv/src/ccm/ccmlib/node.py:56: TimeoutError > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16644) Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343005#comment-17343005 ] Berenguer Blasi commented on CASSANDRA-16644: - The flaky was already [removed|https://github.com/apache/cassandra-dtest/pull/135/files#diff-bea9d3b0d863b1adc1bfe294ec6af0fb32c07b8869bde21c5f9c71f9bf4ad968L229] Can you please confirm we're talking about the same flaky and this is ready to commit with? - CCM [PR|https://github.com/ekaterinadimitrova2/ccm/commit/60515b59cc59dd4ac500321f6b6b3a8818ed7f88] - C* OSS [PR|https://github.com/apache/cassandra-dtest/pull/135/files] (minus the requirements.txt change) > Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup > -- > > Key: CASSANDRA-16644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16644 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-rc > > > Failure > [here|https://ci-cassandra.apache.org/job/Cassandra-trunk/472/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_backwards_and_cleanup/] > {noformat} > Error Message > ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds > Missing: ['Starting listening for CQL clients'] not found in system.log: > Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > Stacktrace > self = at 0x7f66c51ebd90> > @flaky(max_runs=1) > @pytest.mark.no_vnodes > def test_move_backwards_and_cleanup(self): > """Test moving a node backwards without moving past a neighbor > token""" > move_token = '5' > expected_after_move = [gen_expected(range(0, 6), range(31, 40)), >gen_expected(range(0, 21, 2)), >gen_expected(range(1, 6, 2), range(6, 31)), >gen_expected(range(7, 20, 2), range(21, 40))] > expected_after_repair = [gen_expected(range(0, 6), range(31, 40)), > gen_expected(range(0, 21)), > gen_expected(range(6, 31)), > gen_expected(range(21, 40))] > > self.move_test(move_token, expected_after_move, expected_after_repair) > transient_replication_ring_test.py:333: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > transient_replication_ring_test.py:235: in move_test > node4.start(wait_for_binary_proto=True) > transient_replication_ring_test.py:50: in new_start > return old_start(*args, **kwargs) > ../venv/src/ccm/ccmlib/node.py:901: in start > self.wait_for_binary_interface(from_mark=self.mark) > ../venv/src/ccm/ccmlib/node.py:687: in wait_for_binary_interface > self.watch_log_for("Starting listening for CQL clients", **kwargs) > ../venv/src/ccm/ccmlib/node.py:588: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1619481570.0236456, timeout = 90 > msg = "Missing: ['Starting listening for CQL clients'] not found in > system.log:\nTail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura" > node = 'node4' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after > 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in > system.log: > E Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > ../venv/src/ccm/ccmlib/node.py:56: TimeoutError > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16644) Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17342721#comment-17342721 ] Ekaterina Dimitrova commented on CASSANDRA-16644: - I just realized... please fix the flaky annotation, let's not leave it not serving its goal and only bringing confusion. Thanks! > Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup > -- > > Key: CASSANDRA-16644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16644 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-rc > > > Failure > [here|https://ci-cassandra.apache.org/job/Cassandra-trunk/472/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_backwards_and_cleanup/] > {noformat} > Error Message > ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds > Missing: ['Starting listening for CQL clients'] not found in system.log: > Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > Stacktrace > self = at 0x7f66c51ebd90> > @flaky(max_runs=1) > @pytest.mark.no_vnodes > def test_move_backwards_and_cleanup(self): > """Test moving a node backwards without moving past a neighbor > token""" > move_token = '5' > expected_after_move = [gen_expected(range(0, 6), range(31, 40)), >gen_expected(range(0, 21, 2)), >gen_expected(range(1, 6, 2), range(6, 31)), >gen_expected(range(7, 20, 2), range(21, 40))] > expected_after_repair = [gen_expected(range(0, 6), range(31, 40)), > gen_expected(range(0, 21)), > gen_expected(range(6, 31)), > gen_expected(range(21, 40))] > > self.move_test(move_token, expected_after_move, expected_after_repair) > transient_replication_ring_test.py:333: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > transient_replication_ring_test.py:235: in move_test > node4.start(wait_for_binary_proto=True) > transient_replication_ring_test.py:50: in new_start > return old_start(*args, **kwargs) > ../venv/src/ccm/ccmlib/node.py:901: in start > self.wait_for_binary_interface(from_mark=self.mark) > ../venv/src/ccm/ccmlib/node.py:687: in wait_for_binary_interface > self.watch_log_for("Starting listening for CQL clients", **kwargs) > ../venv/src/ccm/ccmlib/node.py:588: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1619481570.0236456, timeout = 90 > msg = "Missing: ['Starting listening for CQL clients'] not found in > system.log:\nTail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura" > node = 'node4' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after > 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in > system.log: > E Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > ../venv/src/ccm/ccmlib/node.py:56: TimeoutError > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16644) Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17342258#comment-17342258 ] Ekaterina Dimitrova commented on CASSANDRA-16644: - I fixed the CCM patch [here|https://github.com/ekaterinadimitrova2/ccm/commit/60515b59cc59dd4ac500321f6b6b3a8818ed7f88], I had a small mistake due to which the TAIL was always empty. Now it is fine, you can see where the method stopped reading from the log. You can see in the latest [runs|https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/808/workflows/737073ba-0e24-4a1f-9cdb-fffbef041a73/jobs/4568/tests#failed-test-0] I submitted with lower timeout, the message we look for is in the log but _watch_log_for_ method doesn't find the message; we just reached the timeout earlier before reaching the message in the log which we read line by line. I am +1 on the raised timeout plus updated CCM debug message. > Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup > -- > > Key: CASSANDRA-16644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16644 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-rc > > > Failure > [here|https://ci-cassandra.apache.org/job/Cassandra-trunk/472/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_backwards_and_cleanup/] > {noformat} > Error Message > ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds > Missing: ['Starting listening for CQL clients'] not found in system.log: > Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > Stacktrace > self = at 0x7f66c51ebd90> > @flaky(max_runs=1) > @pytest.mark.no_vnodes > def test_move_backwards_and_cleanup(self): > """Test moving a node backwards without moving past a neighbor > token""" > move_token = '5' > expected_after_move = [gen_expected(range(0, 6), range(31, 40)), >gen_expected(range(0, 21, 2)), >gen_expected(range(1, 6, 2), range(6, 31)), >gen_expected(range(7, 20, 2), range(21, 40))] > expected_after_repair = [gen_expected(range(0, 6), range(31, 40)), > gen_expected(range(0, 21)), > gen_expected(range(6, 31)), > gen_expected(range(21, 40))] > > self.move_test(move_token, expected_after_move, expected_after_repair) > transient_replication_ring_test.py:333: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > transient_replication_ring_test.py:235: in move_test > node4.start(wait_for_binary_proto=True) > transient_replication_ring_test.py:50: in new_start > return old_start(*args, **kwargs) > ../venv/src/ccm/ccmlib/node.py:901: in start > self.wait_for_binary_interface(from_mark=self.mark) > ../venv/src/ccm/ccmlib/node.py:687: in wait_for_binary_interface > self.watch_log_for("Starting listening for CQL clients", **kwargs) > ../venv/src/ccm/ccmlib/node.py:588: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1619481570.0236456, timeout = 90 > msg = "Missing: ['Starting listening for CQL clients'] not found in > system.log:\nTail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura" > node = 'node4' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after > 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in > system.log: > E Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > ../venv/src/ccm/ccmlib/node.py:56: TimeoutError > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16644) Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17341874#comment-17341874 ] Ekaterina Dimitrova commented on CASSANDRA-16644: - I pushed it on Friday with different timeout values and got a time when it fails but I wanted to try today with the final version of the multiplexer so I can easily track down the logs. ([~adelapena] added that option to easily find the logs you need) I am fine to remove the ccm patch but at least rename correctly the tail to head. I will let you know if I find anything new in the logs > Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup > -- > > Key: CASSANDRA-16644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16644 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-rc > > > Failure > [here|https://ci-cassandra.apache.org/job/Cassandra-trunk/472/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_backwards_and_cleanup/] > {noformat} > Error Message > ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds > Missing: ['Starting listening for CQL clients'] not found in system.log: > Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > Stacktrace > self = at 0x7f66c51ebd90> > @flaky(max_runs=1) > @pytest.mark.no_vnodes > def test_move_backwards_and_cleanup(self): > """Test moving a node backwards without moving past a neighbor > token""" > move_token = '5' > expected_after_move = [gen_expected(range(0, 6), range(31, 40)), >gen_expected(range(0, 21, 2)), >gen_expected(range(1, 6, 2), range(6, 31)), >gen_expected(range(7, 20, 2), range(21, 40))] > expected_after_repair = [gen_expected(range(0, 6), range(31, 40)), > gen_expected(range(0, 21)), > gen_expected(range(6, 31)), > gen_expected(range(21, 40))] > > self.move_test(move_token, expected_after_move, expected_after_repair) > transient_replication_ring_test.py:333: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > transient_replication_ring_test.py:235: in move_test > node4.start(wait_for_binary_proto=True) > transient_replication_ring_test.py:50: in new_start > return old_start(*args, **kwargs) > ../venv/src/ccm/ccmlib/node.py:901: in start > self.wait_for_binary_interface(from_mark=self.mark) > ../venv/src/ccm/ccmlib/node.py:687: in wait_for_binary_interface > self.watch_log_for("Starting listening for CQL clients", **kwargs) > ../venv/src/ccm/ccmlib/node.py:588: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1619481570.0236456, timeout = 90 > msg = "Missing: ['Starting listening for CQL clients'] not found in > system.log:\nTail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura" > node = 'node4' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after > 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in > system.log: > E Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > ../venv/src/ccm/ccmlib/node.py:56: TimeoutError > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16644) Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17341760#comment-17341760 ] Berenguer Blasi commented on CASSANDRA-16644: - Tail patch wasn't producing anything. But seems that just by increasing the timeout the thing runs [clean|https://app.circleci.com/pipelines/github/bereng/cassandra/282/workflows/793a0963-d319-4667-acaf-a74ec3e8b84f/jobs/2697]. It still irks me we're not bottoming out on this thing... Proposed PR [here|https://github.com/apache/cassandra-dtest/pull/135] but I would remove the ccm from Ekaterina. > Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup > -- > > Key: CASSANDRA-16644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16644 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-rc > > > Failure > [here|https://ci-cassandra.apache.org/job/Cassandra-trunk/472/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_backwards_and_cleanup/] > {noformat} > Error Message > ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds > Missing: ['Starting listening for CQL clients'] not found in system.log: > Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > Stacktrace > self = at 0x7f66c51ebd90> > @flaky(max_runs=1) > @pytest.mark.no_vnodes > def test_move_backwards_and_cleanup(self): > """Test moving a node backwards without moving past a neighbor > token""" > move_token = '5' > expected_after_move = [gen_expected(range(0, 6), range(31, 40)), >gen_expected(range(0, 21, 2)), >gen_expected(range(1, 6, 2), range(6, 31)), >gen_expected(range(7, 20, 2), range(21, 40))] > expected_after_repair = [gen_expected(range(0, 6), range(31, 40)), > gen_expected(range(0, 21)), > gen_expected(range(6, 31)), > gen_expected(range(21, 40))] > > self.move_test(move_token, expected_after_move, expected_after_repair) > transient_replication_ring_test.py:333: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > transient_replication_ring_test.py:235: in move_test > node4.start(wait_for_binary_proto=True) > transient_replication_ring_test.py:50: in new_start > return old_start(*args, **kwargs) > ../venv/src/ccm/ccmlib/node.py:901: in start > self.wait_for_binary_interface(from_mark=self.mark) > ../venv/src/ccm/ccmlib/node.py:687: in wait_for_binary_interface > self.watch_log_for("Starting listening for CQL clients", **kwargs) > ../venv/src/ccm/ccmlib/node.py:588: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1619481570.0236456, timeout = 90 > msg = "Missing: ['Starting listening for CQL clients'] not found in > system.log:\nTail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura" > node = 'node4' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after > 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in > system.log: > E Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > ../venv/src/ccm/ccmlib/node.py:56: TimeoutError > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16644) Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17341720#comment-17341720 ] Berenguer Blasi commented on CASSANDRA-16644: - I went ahead and gave the multiplexer one more go. I got plenty of [failures|https://app.circleci.com/pipelines/github/bereng/cassandra/281/workflows/2afcd87f-ce10-4495-9153-a362e198d3cc/jobs/2691/artifacts] but the ones I could look at were legit. We can see [here|https://circle-production-customer-artifacts.s3.amazonaws.com/picard/5e9576564660060baefcf188/6098c61a1823b752f1333400-0-build/artifacts/dtest/stdout.txt?X-Amz-Algorithm=AWS4-HMAC-SHA256=20210510T064321Z=host=60=AKIAJR3Q6CR467H7Z55A%2F20210510%2Fus-east-1%2Fs3%2Faws4_request=f108cc3b72f8eb9728c8c703b518ea0fed001a39ef1a8f3537be43c4e80b3396] the failure from {{2021-05-10 05:39:37,970}} it's indeed missing the message when you track down the log [file|https://circle-production-customer-artifacts.s3.amazonaws.com/picard/5e9576564660060baefcf188/6098c61a1823b752f1333400-0-build/artifacts/dtest_logs/1620625235394_test_move_backwards_and_cleanup%5B2-10%5D/node4.log?X-Amz-Algorithm=AWS4-HMAC-SHA256=20210510T064822Z=host=59=AKIAJR3Q6CR467H7Z55A%2F20210510%2Fus-east-1%2Fs3%2Faws4_request=6915219122fd8bd306dc0ae23ecac80f9ae1193254d97a2d51b9e04c9220c64c]. So I am going to increase timeout and see if I can get some good failures. > Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup > -- > > Key: CASSANDRA-16644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16644 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-rc > > > Failure > [here|https://ci-cassandra.apache.org/job/Cassandra-trunk/472/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_backwards_and_cleanup/] > {noformat} > Error Message > ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds > Missing: ['Starting listening for CQL clients'] not found in system.log: > Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > Stacktrace > self = at 0x7f66c51ebd90> > @flaky(max_runs=1) > @pytest.mark.no_vnodes > def test_move_backwards_and_cleanup(self): > """Test moving a node backwards without moving past a neighbor > token""" > move_token = '5' > expected_after_move = [gen_expected(range(0, 6), range(31, 40)), >gen_expected(range(0, 21, 2)), >gen_expected(range(1, 6, 2), range(6, 31)), >gen_expected(range(7, 20, 2), range(21, 40))] > expected_after_repair = [gen_expected(range(0, 6), range(31, 40)), > gen_expected(range(0, 21)), > gen_expected(range(6, 31)), > gen_expected(range(21, 40))] > > self.move_test(move_token, expected_after_move, expected_after_repair) > transient_replication_ring_test.py:333: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > transient_replication_ring_test.py:235: in move_test > node4.start(wait_for_binary_proto=True) > transient_replication_ring_test.py:50: in new_start > return old_start(*args, **kwargs) > ../venv/src/ccm/ccmlib/node.py:901: in start > self.wait_for_binary_interface(from_mark=self.mark) > ../venv/src/ccm/ccmlib/node.py:687: in wait_for_binary_interface > self.watch_log_for("Starting listening for CQL clients", **kwargs) > ../venv/src/ccm/ccmlib/node.py:588: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1619481570.0236456, timeout = 90 > msg = "Missing: ['Starting listening for CQL clients'] not found in > system.log:\nTail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura" > node = 'node4' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after > 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in > system.log: > E Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > ../venv/src/ccm/ccmlib/node.py:56: TimeoutError > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16644) Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340845#comment-17340845 ] Berenguer Blasi commented on CASSANDRA-16644: - Why wait if we can trigger it? give it one more try imo. It's just a couple clicks if I am not missing anything? > Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup > -- > > Key: CASSANDRA-16644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16644 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-rc > > > Failure > [here|https://ci-cassandra.apache.org/job/Cassandra-trunk/472/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_backwards_and_cleanup/] > {noformat} > Error Message > ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds > Missing: ['Starting listening for CQL clients'] not found in system.log: > Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > Stacktrace > self = at 0x7f66c51ebd90> > @flaky(max_runs=1) > @pytest.mark.no_vnodes > def test_move_backwards_and_cleanup(self): > """Test moving a node backwards without moving past a neighbor > token""" > move_token = '5' > expected_after_move = [gen_expected(range(0, 6), range(31, 40)), >gen_expected(range(0, 21, 2)), >gen_expected(range(1, 6, 2), range(6, 31)), >gen_expected(range(7, 20, 2), range(21, 40))] > expected_after_repair = [gen_expected(range(0, 6), range(31, 40)), > gen_expected(range(0, 21)), > gen_expected(range(6, 31)), > gen_expected(range(21, 40))] > > self.move_test(move_token, expected_after_move, expected_after_repair) > transient_replication_ring_test.py:333: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > transient_replication_ring_test.py:235: in move_test > node4.start(wait_for_binary_proto=True) > transient_replication_ring_test.py:50: in new_start > return old_start(*args, **kwargs) > ../venv/src/ccm/ccmlib/node.py:901: in start > self.wait_for_binary_interface(from_mark=self.mark) > ../venv/src/ccm/ccmlib/node.py:687: in wait_for_binary_interface > self.watch_log_for("Starting listening for CQL clients", **kwargs) > ../venv/src/ccm/ccmlib/node.py:588: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1619481570.0236456, timeout = 90 > msg = "Missing: ['Starting listening for CQL clients'] not found in > system.log:\nTail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura" > node = 'node4' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after > 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in > system.log: > E Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > ../venv/src/ccm/ccmlib/node.py:56: TimeoutError > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16644) Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340844#comment-17340844 ] Ekaterina Dimitrova commented on CASSANDRA-16644: - I suggest we commit the debug patch and next time it appears if it appears we can think about it, then we will have the exact view what was read from the log. This is a test/infra issue and also it appears very rarely. It is also for experimental feature. WDYT? > Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup > -- > > Key: CASSANDRA-16644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16644 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-rc > > > Failure > [here|https://ci-cassandra.apache.org/job/Cassandra-trunk/472/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_backwards_and_cleanup/] > {noformat} > Error Message > ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds > Missing: ['Starting listening for CQL clients'] not found in system.log: > Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > Stacktrace > self = at 0x7f66c51ebd90> > @flaky(max_runs=1) > @pytest.mark.no_vnodes > def test_move_backwards_and_cleanup(self): > """Test moving a node backwards without moving past a neighbor > token""" > move_token = '5' > expected_after_move = [gen_expected(range(0, 6), range(31, 40)), >gen_expected(range(0, 21, 2)), >gen_expected(range(1, 6, 2), range(6, 31)), >gen_expected(range(7, 20, 2), range(21, 40))] > expected_after_repair = [gen_expected(range(0, 6), range(31, 40)), > gen_expected(range(0, 21)), > gen_expected(range(6, 31)), > gen_expected(range(21, 40))] > > self.move_test(move_token, expected_after_move, expected_after_repair) > transient_replication_ring_test.py:333: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > transient_replication_ring_test.py:235: in move_test > node4.start(wait_for_binary_proto=True) > transient_replication_ring_test.py:50: in new_start > return old_start(*args, **kwargs) > ../venv/src/ccm/ccmlib/node.py:901: in start > self.wait_for_binary_interface(from_mark=self.mark) > ../venv/src/ccm/ccmlib/node.py:687: in wait_for_binary_interface > self.watch_log_for("Starting listening for CQL clients", **kwargs) > ../venv/src/ccm/ccmlib/node.py:588: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1619481570.0236456, timeout = 90 > msg = "Missing: ['Starting listening for CQL clients'] not found in > system.log:\nTail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura" > node = 'node4' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after > 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in > system.log: > E Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > ../venv/src/ccm/ccmlib/node.py:56: TimeoutError > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16644) Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340706#comment-17340706 ] Berenguer Blasi commented on CASSANDRA-16644: - [~e.dimitrova] I have been testing this with diff timeouts and they are either too short or too long so the failure can't be repro'd like that unfortunately. So I looked at your patch and I think it is the right thing to do indeed to get some more info. I can't explain why you didn't repro bc you have the same exact code I would have produced :shrug: Would you mind running it again? maybe this time we'll be lucky. Otherwise we'll ask Andres which is in good terms with the bug repro Gods and see if he can with your patch. > Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup > -- > > Key: CASSANDRA-16644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16644 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-rc > > > Failure > [here|https://ci-cassandra.apache.org/job/Cassandra-trunk/472/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_backwards_and_cleanup/] > {noformat} > Error Message > ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds > Missing: ['Starting listening for CQL clients'] not found in system.log: > Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > Stacktrace > self = at 0x7f66c51ebd90> > @flaky(max_runs=1) > @pytest.mark.no_vnodes > def test_move_backwards_and_cleanup(self): > """Test moving a node backwards without moving past a neighbor > token""" > move_token = '5' > expected_after_move = [gen_expected(range(0, 6), range(31, 40)), >gen_expected(range(0, 21, 2)), >gen_expected(range(1, 6, 2), range(6, 31)), >gen_expected(range(7, 20, 2), range(21, 40))] > expected_after_repair = [gen_expected(range(0, 6), range(31, 40)), > gen_expected(range(0, 21)), > gen_expected(range(6, 31)), > gen_expected(range(21, 40))] > > self.move_test(move_token, expected_after_move, expected_after_repair) > transient_replication_ring_test.py:333: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > transient_replication_ring_test.py:235: in move_test > node4.start(wait_for_binary_proto=True) > transient_replication_ring_test.py:50: in new_start > return old_start(*args, **kwargs) > ../venv/src/ccm/ccmlib/node.py:901: in start > self.wait_for_binary_interface(from_mark=self.mark) > ../venv/src/ccm/ccmlib/node.py:687: in wait_for_binary_interface > self.watch_log_for("Starting listening for CQL clients", **kwargs) > ../venv/src/ccm/ccmlib/node.py:588: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1619481570.0236456, timeout = 90 > msg = "Missing: ['Starting listening for CQL clients'] not found in > system.log:\nTail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura" > node = 'node4' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after > 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in > system.log: > E Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > ../venv/src/ccm/ccmlib/node.py:56: TimeoutError > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16644) Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17339996#comment-17339996 ] Berenguer Blasi commented on CASSANDRA-16644: - Hi [~e.dimitrova] thx for looking into this. I was buried yesterday and couldn't look much but I hope to do so today. Yes the 'flaky' annotation will either be removed or moved to '2' once we reach a conclusion. > Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup > -- > > Key: CASSANDRA-16644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16644 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-rc > > > Failure > [here|https://ci-cassandra.apache.org/job/Cassandra-trunk/472/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_backwards_and_cleanup/] > {noformat} > Error Message > ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds > Missing: ['Starting listening for CQL clients'] not found in system.log: > Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > Stacktrace > self = at 0x7f66c51ebd90> > @flaky(max_runs=1) > @pytest.mark.no_vnodes > def test_move_backwards_and_cleanup(self): > """Test moving a node backwards without moving past a neighbor > token""" > move_token = '5' > expected_after_move = [gen_expected(range(0, 6), range(31, 40)), >gen_expected(range(0, 21, 2)), >gen_expected(range(1, 6, 2), range(6, 31)), >gen_expected(range(7, 20, 2), range(21, 40))] > expected_after_repair = [gen_expected(range(0, 6), range(31, 40)), > gen_expected(range(0, 21)), > gen_expected(range(6, 31)), > gen_expected(range(21, 40))] > > self.move_test(move_token, expected_after_move, expected_after_repair) > transient_replication_ring_test.py:333: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > transient_replication_ring_test.py:235: in move_test > node4.start(wait_for_binary_proto=True) > transient_replication_ring_test.py:50: in new_start > return old_start(*args, **kwargs) > ../venv/src/ccm/ccmlib/node.py:901: in start > self.wait_for_binary_interface(from_mark=self.mark) > ../venv/src/ccm/ccmlib/node.py:687: in wait_for_binary_interface > self.watch_log_for("Starting listening for CQL clients", **kwargs) > ../venv/src/ccm/ccmlib/node.py:588: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1619481570.0236456, timeout = 90 > msg = "Missing: ['Starting listening for CQL clients'] not found in > system.log:\nTail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura" > node = 'node4' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after > 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in > system.log: > E Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > ../venv/src/ccm/ccmlib/node.py:56: TimeoutError > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16644) Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17339943#comment-17339943 ] Ekaterina Dimitrova commented on CASSANDRA-16644: - That is so weird, I tried to push two times 1000 runs since last night, same config and same jobs on java 11 as [~adelapena] and I can't reproduce anything. I would like to suggest this [tiny patch |https://github.com/ekaterinadimitrova2/ccm/commit/3f4ba3597f0809201f88dae403a15c33d12dd021] which in case it fails again will help us to see where it was. The _Tail_ added before is actually _Head_ so I decided to add both. I think having both would be good for debugging in such weird cases. WDYT? Also, on the additional topic I brought, I think it is meaningless to keep _@flaky(max_runs=1)_, either we should consider there was a mistake and 2 was aimed or we can remove it to prevent further confusion. WDYT? > Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup > -- > > Key: CASSANDRA-16644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16644 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-rc > > > Failure > [here|https://ci-cassandra.apache.org/job/Cassandra-trunk/472/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_backwards_and_cleanup/] > {noformat} > Error Message > ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds > Missing: ['Starting listening for CQL clients'] not found in system.log: > Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > Stacktrace > self = at 0x7f66c51ebd90> > @flaky(max_runs=1) > @pytest.mark.no_vnodes > def test_move_backwards_and_cleanup(self): > """Test moving a node backwards without moving past a neighbor > token""" > move_token = '5' > expected_after_move = [gen_expected(range(0, 6), range(31, 40)), >gen_expected(range(0, 21, 2)), >gen_expected(range(1, 6, 2), range(6, 31)), >gen_expected(range(7, 20, 2), range(21, 40))] > expected_after_repair = [gen_expected(range(0, 6), range(31, 40)), > gen_expected(range(0, 21)), > gen_expected(range(6, 31)), > gen_expected(range(21, 40))] > > self.move_test(move_token, expected_after_move, expected_after_repair) > transient_replication_ring_test.py:333: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > transient_replication_ring_test.py:235: in move_test > node4.start(wait_for_binary_proto=True) > transient_replication_ring_test.py:50: in new_start > return old_start(*args, **kwargs) > ../venv/src/ccm/ccmlib/node.py:901: in start > self.wait_for_binary_interface(from_mark=self.mark) > ../venv/src/ccm/ccmlib/node.py:687: in wait_for_binary_interface > self.watch_log_for("Starting listening for CQL clients", **kwargs) > ../venv/src/ccm/ccmlib/node.py:588: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1619481570.0236456, timeout = 90 > msg = "Missing: ['Starting listening for CQL clients'] not found in > system.log:\nTail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura" > node = 'node4' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after > 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in > system.log: > E Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > ../venv/src/ccm/ccmlib/node.py:56: TimeoutError > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16644) Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17339627#comment-17339627 ] Ekaterina Dimitrova commented on CASSANDRA-16644: - I played with the new multiplexer yesterday, added some debug messages but my first try of 1000 runs didn't fail even once. I tried both java 8 and java 11. Started another run with more repetitions but it fails as the build is failing for some reason. I will check again later today when the build is fixed > Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup > -- > > Key: CASSANDRA-16644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16644 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-rc > > > Failure > [here|https://ci-cassandra.apache.org/job/Cassandra-trunk/472/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_backwards_and_cleanup/] > {noformat} > Error Message > ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds > Missing: ['Starting listening for CQL clients'] not found in system.log: > Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > Stacktrace > self = at 0x7f66c51ebd90> > @flaky(max_runs=1) > @pytest.mark.no_vnodes > def test_move_backwards_and_cleanup(self): > """Test moving a node backwards without moving past a neighbor > token""" > move_token = '5' > expected_after_move = [gen_expected(range(0, 6), range(31, 40)), >gen_expected(range(0, 21, 2)), >gen_expected(range(1, 6, 2), range(6, 31)), >gen_expected(range(7, 20, 2), range(21, 40))] > expected_after_repair = [gen_expected(range(0, 6), range(31, 40)), > gen_expected(range(0, 21)), > gen_expected(range(6, 31)), > gen_expected(range(21, 40))] > > self.move_test(move_token, expected_after_move, expected_after_repair) > transient_replication_ring_test.py:333: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > transient_replication_ring_test.py:235: in move_test > node4.start(wait_for_binary_proto=True) > transient_replication_ring_test.py:50: in new_start > return old_start(*args, **kwargs) > ../venv/src/ccm/ccmlib/node.py:901: in start > self.wait_for_binary_interface(from_mark=self.mark) > ../venv/src/ccm/ccmlib/node.py:687: in wait_for_binary_interface > self.watch_log_for("Starting listening for CQL clients", **kwargs) > ../venv/src/ccm/ccmlib/node.py:588: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1619481570.0236456, timeout = 90 > msg = "Missing: ['Starting listening for CQL clients'] not found in > system.log:\nTail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura" > node = 'node4' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after > 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in > system.log: > E Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > ../venv/src/ccm/ccmlib/node.py:56: TimeoutError > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16644) Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17339516#comment-17339516 ] Berenguer Blasi commented on CASSANDRA-16644: - The only failure I could track down from [~adelapena] multi run repros the problem We can see [here|https://circle-production-customer-artifacts.s3.amazonaws.com/picard/52e11c3bf7d0f78c41388745/608de58867e0447553dcf233-1-build/artifacts/dtest/stdout.txt?X-Amz-Algorithm=AWS4-HMAC-SHA256=20210505T081400Z=host=60=AKIAJR3Q6CR467H7Z55A%2F20210505%2Fus-east-1%2Fs3%2Faws4_request=e0c2ac533b07578c306d0500b9b9c618a04799610ae57e9294a0c50827bea00d] bq. ccmlib.node.TimeoutError: 02 May 2021 00:01:09 [node4] after 90.12/90 seconds Missing: ['Starting listening for CQL clients'] not found in system.log it hasn't found it on 00:01:09 when it was already present at 00:00:27 bq. INFO [main] 2021-05-02 00:00:27,098 PipelineConfigurator.java:125 - Starting listening for CQL clients on /127.0.0.4:9042 (unencrypted)... So sthg doesn't add up somewhere indeed... > Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup > -- > > Key: CASSANDRA-16644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16644 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-rc > > > Failure > [here|https://ci-cassandra.apache.org/job/Cassandra-trunk/472/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_backwards_and_cleanup/] > {noformat} > Error Message > ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds > Missing: ['Starting listening for CQL clients'] not found in system.log: > Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > Stacktrace > self = at 0x7f66c51ebd90> > @flaky(max_runs=1) > @pytest.mark.no_vnodes > def test_move_backwards_and_cleanup(self): > """Test moving a node backwards without moving past a neighbor > token""" > move_token = '5' > expected_after_move = [gen_expected(range(0, 6), range(31, 40)), >gen_expected(range(0, 21, 2)), >gen_expected(range(1, 6, 2), range(6, 31)), >gen_expected(range(7, 20, 2), range(21, 40))] > expected_after_repair = [gen_expected(range(0, 6), range(31, 40)), > gen_expected(range(0, 21)), > gen_expected(range(6, 31)), > gen_expected(range(21, 40))] > > self.move_test(move_token, expected_after_move, expected_after_repair) > transient_replication_ring_test.py:333: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > transient_replication_ring_test.py:235: in move_test > node4.start(wait_for_binary_proto=True) > transient_replication_ring_test.py:50: in new_start > return old_start(*args, **kwargs) > ../venv/src/ccm/ccmlib/node.py:901: in start > self.wait_for_binary_interface(from_mark=self.mark) > ../venv/src/ccm/ccmlib/node.py:687: in wait_for_binary_interface > self.watch_log_for("Starting listening for CQL clients", **kwargs) > ../venv/src/ccm/ccmlib/node.py:588: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1619481570.0236456, timeout = 90 > msg = "Missing: ['Starting listening for CQL clients'] not found in > system.log:\nTail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura" > node = 'node4' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after > 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in > system.log: > E Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > ../venv/src/ccm/ccmlib/node.py:56: TimeoutError > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16644) Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17339392#comment-17339392 ] Berenguer Blasi commented on CASSANDRA-16644: - Thx for the clarification [~e.dimitrova]. That would mean there is around 20s delay between the log line being issued and actually making it into the log file. I will take a look at Andres multiplex run and see if I can see anything in there. > Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup > -- > > Key: CASSANDRA-16644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16644 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-rc > > > Failure > [here|https://ci-cassandra.apache.org/job/Cassandra-trunk/472/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_backwards_and_cleanup/] > {noformat} > Error Message > ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds > Missing: ['Starting listening for CQL clients'] not found in system.log: > Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > Stacktrace > self = at 0x7f66c51ebd90> > @flaky(max_runs=1) > @pytest.mark.no_vnodes > def test_move_backwards_and_cleanup(self): > """Test moving a node backwards without moving past a neighbor > token""" > move_token = '5' > expected_after_move = [gen_expected(range(0, 6), range(31, 40)), >gen_expected(range(0, 21, 2)), >gen_expected(range(1, 6, 2), range(6, 31)), >gen_expected(range(7, 20, 2), range(21, 40))] > expected_after_repair = [gen_expected(range(0, 6), range(31, 40)), > gen_expected(range(0, 21)), > gen_expected(range(6, 31)), > gen_expected(range(21, 40))] > > self.move_test(move_token, expected_after_move, expected_after_repair) > transient_replication_ring_test.py:333: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > transient_replication_ring_test.py:235: in move_test > node4.start(wait_for_binary_proto=True) > transient_replication_ring_test.py:50: in new_start > return old_start(*args, **kwargs) > ../venv/src/ccm/ccmlib/node.py:901: in start > self.wait_for_binary_interface(from_mark=self.mark) > ../venv/src/ccm/ccmlib/node.py:687: in wait_for_binary_interface > self.watch_log_for("Starting listening for CQL clients", **kwargs) > ../venv/src/ccm/ccmlib/node.py:588: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1619481570.0236456, timeout = 90 > msg = "Missing: ['Starting listening for CQL clients'] not found in > system.log:\nTail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura" > node = 'node4' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after > 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in > system.log: > E Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > ../venv/src/ccm/ccmlib/node.py:56: TimeoutError > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16644) Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338953#comment-17338953 ] Ekaterina Dimitrova commented on CASSANDRA-16644: - I am sorry [~bereng] , it seems I confused you. These are two different unrelated things I was talking about: 1) I saw the @flaky mark and I felt that it is worth it to mention it while we are looking into these tests as it seems useless now. 2) Indeed, the current timeout issue is a different problem. So what I meant is that the log file is processed line by line as far as I remember, if for some reason the processing was slow, that line where the message is might have not been reached yet until the 90 seconds pass. I will try to add some logging and use the new multiplexer job from [~adelapena]. Thanks [~adelapena], indeed it helps. > Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup > -- > > Key: CASSANDRA-16644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16644 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-rc > > > Failure > [here|https://ci-cassandra.apache.org/job/Cassandra-trunk/472/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_backwards_and_cleanup/] > {noformat} > Error Message > ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds > Missing: ['Starting listening for CQL clients'] not found in system.log: > Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > Stacktrace > self = at 0x7f66c51ebd90> > @flaky(max_runs=1) > @pytest.mark.no_vnodes > def test_move_backwards_and_cleanup(self): > """Test moving a node backwards without moving past a neighbor > token""" > move_token = '5' > expected_after_move = [gen_expected(range(0, 6), range(31, 40)), >gen_expected(range(0, 21, 2)), >gen_expected(range(1, 6, 2), range(6, 31)), >gen_expected(range(7, 20, 2), range(21, 40))] > expected_after_repair = [gen_expected(range(0, 6), range(31, 40)), > gen_expected(range(0, 21)), > gen_expected(range(6, 31)), > gen_expected(range(21, 40))] > > self.move_test(move_token, expected_after_move, expected_after_repair) > transient_replication_ring_test.py:333: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > transient_replication_ring_test.py:235: in move_test > node4.start(wait_for_binary_proto=True) > transient_replication_ring_test.py:50: in new_start > return old_start(*args, **kwargs) > ../venv/src/ccm/ccmlib/node.py:901: in start > self.wait_for_binary_interface(from_mark=self.mark) > ../venv/src/ccm/ccmlib/node.py:687: in wait_for_binary_interface > self.watch_log_for("Starting listening for CQL clients", **kwargs) > ../venv/src/ccm/ccmlib/node.py:588: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1619481570.0236456, timeout = 90 > msg = "Missing: ['Starting listening for CQL clients'] not found in > system.log:\nTail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura" > node = 'node4' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after > 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in > system.log: > E Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > ../venv/src/ccm/ccmlib/node.py:56: TimeoutError > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16644) Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338924#comment-17338924 ] Andres de la Peña commented on CASSANDRA-16644: --- If it's of any help, it seems that the CircleCi-based test multiplexer proposed in CASSANDRA-16625 has managed to reproduce the failure 54 times in [this run|https://app.circleci.com/pipelines/github/adelapena/cassandra/376/workflows/eb463f0a-ad93-488a-89a3-bb44fbc0c578/jobs/3270] with 1000 iterations. > Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup > -- > > Key: CASSANDRA-16644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16644 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-rc > > > Failure > [here|https://ci-cassandra.apache.org/job/Cassandra-trunk/472/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_backwards_and_cleanup/] > {noformat} > Error Message > ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds > Missing: ['Starting listening for CQL clients'] not found in system.log: > Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > Stacktrace > self = at 0x7f66c51ebd90> > @flaky(max_runs=1) > @pytest.mark.no_vnodes > def test_move_backwards_and_cleanup(self): > """Test moving a node backwards without moving past a neighbor > token""" > move_token = '5' > expected_after_move = [gen_expected(range(0, 6), range(31, 40)), >gen_expected(range(0, 21, 2)), >gen_expected(range(1, 6, 2), range(6, 31)), >gen_expected(range(7, 20, 2), range(21, 40))] > expected_after_repair = [gen_expected(range(0, 6), range(31, 40)), > gen_expected(range(0, 21)), > gen_expected(range(6, 31)), > gen_expected(range(21, 40))] > > self.move_test(move_token, expected_after_move, expected_after_repair) > transient_replication_ring_test.py:333: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > transient_replication_ring_test.py:235: in move_test > node4.start(wait_for_binary_proto=True) > transient_replication_ring_test.py:50: in new_start > return old_start(*args, **kwargs) > ../venv/src/ccm/ccmlib/node.py:901: in start > self.wait_for_binary_interface(from_mark=self.mark) > ../venv/src/ccm/ccmlib/node.py:687: in wait_for_binary_interface > self.watch_log_for("Starting listening for CQL clients", **kwargs) > ../venv/src/ccm/ccmlib/node.py:588: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1619481570.0236456, timeout = 90 > msg = "Missing: ['Starting listening for CQL clients'] not found in > system.log:\nTail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura" > node = 'node4' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after > 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in > system.log: > E Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > ../venv/src/ccm/ccmlib/node.py:56: TimeoutError > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16644) Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338743#comment-17338743 ] Berenguer Blasi commented on CASSANDRA-16644: - Flaky or not, what I am worried about is the log perusing logic not having caught up on that message being present in the logs indeed. Either that or the log messages reach the log with a delay. But then it would mean there's a 20s delay somewhere which makes no sense. If that'd be the case adding flaky=2 won't help here as that would be a general problem affecting all tests. So I having no sensible proposal I can only suggest we close until/if it resurfaces as investigating further won't take me anywhere imo. But maybe a new pair of eyes have better ideas wdyt? > Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup > -- > > Key: CASSANDRA-16644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16644 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-rc > > > Failure > [here|https://ci-cassandra.apache.org/job/Cassandra-trunk/472/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_backwards_and_cleanup/] > {noformat} > Error Message > ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds > Missing: ['Starting listening for CQL clients'] not found in system.log: > Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > Stacktrace > self = at 0x7f66c51ebd90> > @flaky(max_runs=1) > @pytest.mark.no_vnodes > def test_move_backwards_and_cleanup(self): > """Test moving a node backwards without moving past a neighbor > token""" > move_token = '5' > expected_after_move = [gen_expected(range(0, 6), range(31, 40)), >gen_expected(range(0, 21, 2)), >gen_expected(range(1, 6, 2), range(6, 31)), >gen_expected(range(7, 20, 2), range(21, 40))] > expected_after_repair = [gen_expected(range(0, 6), range(31, 40)), > gen_expected(range(0, 21)), > gen_expected(range(6, 31)), > gen_expected(range(21, 40))] > > self.move_test(move_token, expected_after_move, expected_after_repair) > transient_replication_ring_test.py:333: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > transient_replication_ring_test.py:235: in move_test > node4.start(wait_for_binary_proto=True) > transient_replication_ring_test.py:50: in new_start > return old_start(*args, **kwargs) > ../venv/src/ccm/ccmlib/node.py:901: in start > self.wait_for_binary_interface(from_mark=self.mark) > ../venv/src/ccm/ccmlib/node.py:687: in wait_for_binary_interface > self.watch_log_for("Starting listening for CQL clients", **kwargs) > ../venv/src/ccm/ccmlib/node.py:588: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1619481570.0236456, timeout = 90 > msg = "Missing: ['Starting listening for CQL clients'] not found in > system.log:\nTail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura" > node = 'node4' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after > 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in > system.log: > E Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > ../venv/src/ccm/ccmlib/node.py:56: TimeoutError > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16644) Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338718#comment-17338718 ] Ekaterina Dimitrova commented on CASSANDRA-16644: - I feel that It is still a matter of timing and that the difference between the messages when they appeared in the logs doesn't mean that It took the same time for them to be processed, maybe? I had some trouble with my local ccm and didn't manage to check this but you might want to try first reducing significantly the time out to verify my theory. > Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup > -- > > Key: CASSANDRA-16644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16644 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-rc > > > Failure > [here|https://ci-cassandra.apache.org/job/Cassandra-trunk/472/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_backwards_and_cleanup/] > {noformat} > Error Message > ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds > Missing: ['Starting listening for CQL clients'] not found in system.log: > Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > Stacktrace > self = at 0x7f66c51ebd90> > @flaky(max_runs=1) > @pytest.mark.no_vnodes > def test_move_backwards_and_cleanup(self): > """Test moving a node backwards without moving past a neighbor > token""" > move_token = '5' > expected_after_move = [gen_expected(range(0, 6), range(31, 40)), >gen_expected(range(0, 21, 2)), >gen_expected(range(1, 6, 2), range(6, 31)), >gen_expected(range(7, 20, 2), range(21, 40))] > expected_after_repair = [gen_expected(range(0, 6), range(31, 40)), > gen_expected(range(0, 21)), > gen_expected(range(6, 31)), > gen_expected(range(21, 40))] > > self.move_test(move_token, expected_after_move, expected_after_repair) > transient_replication_ring_test.py:333: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > transient_replication_ring_test.py:235: in move_test > node4.start(wait_for_binary_proto=True) > transient_replication_ring_test.py:50: in new_start > return old_start(*args, **kwargs) > ../venv/src/ccm/ccmlib/node.py:901: in start > self.wait_for_binary_interface(from_mark=self.mark) > ../venv/src/ccm/ccmlib/node.py:687: in wait_for_binary_interface > self.watch_log_for("Starting listening for CQL clients", **kwargs) > ../venv/src/ccm/ccmlib/node.py:588: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1619481570.0236456, timeout = 90 > msg = "Missing: ['Starting listening for CQL clients'] not found in > system.log:\nTail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura" > node = 'node4' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after > 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in > system.log: > E Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > ../venv/src/ccm/ccmlib/node.py:56: TimeoutError > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16644) Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338593#comment-17338593 ] Ekaterina Dimitrova commented on CASSANDRA-16644: - I want to look at the test more but my first observation is the tests in that class are marked flaky with max number of runs 1. At first I thought it is 1 run more if it fails but a quick check showed me just one run at all which makes flaky useless in this case. So if nothing else and those should be really flaky (I didn't find a discussion around them in the original ticket where they were added), at least I would say run them twice or mark them with _@flaky(max_runs=2)_ > Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup > -- > > Key: CASSANDRA-16644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16644 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-rc > > > Failure > [here|https://ci-cassandra.apache.org/job/Cassandra-trunk/472/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_backwards_and_cleanup/] > {noformat} > Error Message > ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds > Missing: ['Starting listening for CQL clients'] not found in system.log: > Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > Stacktrace > self = at 0x7f66c51ebd90> > @flaky(max_runs=1) > @pytest.mark.no_vnodes > def test_move_backwards_and_cleanup(self): > """Test moving a node backwards without moving past a neighbor > token""" > move_token = '5' > expected_after_move = [gen_expected(range(0, 6), range(31, 40)), >gen_expected(range(0, 21, 2)), >gen_expected(range(1, 6, 2), range(6, 31)), >gen_expected(range(7, 20, 2), range(21, 40))] > expected_after_repair = [gen_expected(range(0, 6), range(31, 40)), > gen_expected(range(0, 21)), > gen_expected(range(6, 31)), > gen_expected(range(21, 40))] > > self.move_test(move_token, expected_after_move, expected_after_repair) > transient_replication_ring_test.py:333: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > transient_replication_ring_test.py:235: in move_test > node4.start(wait_for_binary_proto=True) > transient_replication_ring_test.py:50: in new_start > return old_start(*args, **kwargs) > ../venv/src/ccm/ccmlib/node.py:901: in start > self.wait_for_binary_interface(from_mark=self.mark) > ../venv/src/ccm/ccmlib/node.py:687: in wait_for_binary_interface > self.watch_log_for("Starting listening for CQL clients", **kwargs) > ../venv/src/ccm/ccmlib/node.py:588: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1619481570.0236456, timeout = 90 > msg = "Missing: ['Starting listening for CQL clients'] not found in > system.log:\nTail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura" > node = 'node4' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after > 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in > system.log: > E Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > ../venv/src/ccm/ccmlib/node.py:56: TimeoutError > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16644) Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337154#comment-17337154 ] Berenguer Blasi commented on CASSANDRA-16644: - In order to follow the chain of events you need the logs [here|https://nightlies.apache.org/cassandra/trunk/Cassandra-trunk-dtest-novnode/403/Cassandra-trunk-dtest-novnode/label=cassandra-dtest,split=64/] The test fails on bq. ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in system.log: But inspecting stdout logs we can see the node is started around 23:59:21 {noformat} 23:59:21,502 ccm INFO Starting node4 with JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 java_version=8 cassandra_version=4.0, install_dir=/home/cassandra/cassandra 00:00:19,524 cassandra.cluster INFO New Cassandra host discovered {noformat} in node4 system logs we can see the actual message is really present in the log and the node has started: {noformat} INFO [main] 2021-04-27 00:00:18,493 PipelineConfigurator.java:125 - Starting listening for CQL clients on /127.0.0.4:9042 (unencrypted)... {noformat} We seem to have a failure where a message present at 00:00:18 is not being detected and the failure is triggered at 00:01:00 which makes no sense. I have followed the code looking for holes in the log {{mark}} logic but I didn't see any obvious problems. The test seems pretty solid looking at jenkins history, it is a test problem not a code problem, we have no previous tickets on it, there doesn't seem to be anything else to investigate,... so I am proposing closing this one and reopening if it raises again. Then I would add some further debug to have some thread we start pulling. > Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup > -- > > Key: CASSANDRA-16644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16644 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/python >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > > Failure > [here|https://ci-cassandra.apache.org/job/Cassandra-trunk/472/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_backwards_and_cleanup/] > {noformat} > Error Message > ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds > Missing: ['Starting listening for CQL clients'] not found in system.log: > Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura > Stacktrace > self = at 0x7f66c51ebd90> > @flaky(max_runs=1) > @pytest.mark.no_vnodes > def test_move_backwards_and_cleanup(self): > """Test moving a node backwards without moving past a neighbor > token""" > move_token = '5' > expected_after_move = [gen_expected(range(0, 6), range(31, 40)), >gen_expected(range(0, 21, 2)), >gen_expected(range(1, 6, 2), range(6, 31)), >gen_expected(range(7, 20, 2), range(21, 40))] > expected_after_repair = [gen_expected(range(0, 6), range(31, 40)), > gen_expected(range(0, 21)), > gen_expected(range(6, 31)), > gen_expected(range(21, 40))] > > self.move_test(move_token, expected_after_move, expected_after_repair) > transient_replication_ring_test.py:333: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > transient_replication_ring_test.py:235: in move_test > node4.start(wait_for_binary_proto=True) > transient_replication_ring_test.py:50: in new_start > return old_start(*args, **kwargs) > ../venv/src/ccm/ccmlib/node.py:901: in start > self.wait_for_binary_interface(from_mark=self.mark) > ../venv/src/ccm/ccmlib/node.py:687: in wait_for_binary_interface > self.watch_log_for("Starting listening for CQL clients", **kwargs) > ../venv/src/ccm/ccmlib/node.py:588: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1619481570.0236456, timeout = 90 > msg = "Missing: ['Starting listening for CQL clients'] not found in > system.log:\nTail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura" > node = 'node4' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after > 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in >