[ 
https://issues.apache.org/jira/browse/CASSANDRA-16061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17189074#comment-17189074
 ] 

Berenguer Blasi commented on CASSANDRA-16061:
---------------------------------------------

This is the failure for reference as ci-cass logs are gone now:

{noformat}
===Flaky Test Report===

test_move_forwards_and_cleanup failed; it passed 0 out of the required 1 times.
        <class 'ccmlib.node.TimeoutError'>
        02 Sep 2020 07:18:39 [node4] Missing: ['Starting listening for CQL 
clients']:
INFO  [main] 2020-09-02 09:17:30,390 YamlConfigura.....
See system.log for remainder
        [<TracebackEntry 
/media/sf_VBoxSharedFolder/dtestsvbox/transient_replication_ring_test.py:301>, 
<TracebackEntry 
/media/sf_VBoxSharedFolder/dtestsvbox/transient_replication_ring_test.py:231>, 
<TracebackEntry 
/media/sf_VBoxSharedFolder/dtestsvbox/transient_replication_ring_test.py:47>, 
<TracebackEntry 
/media/sf_VBoxSharedFolder/dtestsvbox/src/ccm/ccmlib/node.py:798>, 
<TracebackEntry 
/media/sf_VBoxSharedFolder/dtestsvbox/src/ccm/ccmlib/node.py:591>, 
<TracebackEntry 
/media/sf_VBoxSharedFolder/dtestsvbox/src/ccm/ccmlib/node.py:548>]

===End Flaky Test Report===
{noformat}

It's hard to repro. There is some exotic race where 
[boostrap.get()|https://github.com/apache/cassandra/blob/23ba48aa935d3f81e66b65285fa8e7972f94dcfe/src/java/org/apache/cassandra/service/StorageService.java#L1584]
 will block as a default connection never completes blocking 
[here|https://github.com/apache/cassandra/blob/23ba48aa935d3f81e66b65285fa8e7972f94dcfe/src/java/org/apache/cassandra/streaming/DefaultConnectionFactory.java#L50].
 That just times out the test.

That shouldn't be as the default connection has a built in timeout. Even 
forcing a timeout myself when waiting on it won't do the trick. Somehow 
connecting to node1 is not possible.

I have been debugging this as much as I can. The netty code needs some time to 
penetrate and I don't have a full grasp of it despite what I saw made sense. 
{{bootstrap.get()}} blocks on AbstractFuture 
[parking|https://github.com/google/guava/blob/v27.0/guava/src/com/google/common/util/concurrent/AbstractFuture.java#L523]
 the thread. If you google a bit you'll find many people getting blocked 
threads around this area given guava makes some assumptions apparently.

I am taking a break here as I am not progressing so I need to go back to the 
drawing board on how to approach this. If sbdy wants to try the challenge feel 
free to assign it.

> transient_replication_ring_test.py::TestTransientReplicationRing::test_move_forwards_and_cleanup
> ------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-16061
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16061
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Test/dtest/python
>            Reporter: Ekaterina Dimitrova
>            Assignee: Berenguer Blasi
>            Priority: Normal
>             Fix For: 4.0-beta
>
>
> Failing here, also locally:
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/312/workflows/da4ce69c-e778-467e-b9f3-27ab166a8321/jobs/1945]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to