[ 
https://issues.apache.org/jira/browse/THRIFT-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876057#comment-17876057
 ] 

Yuxuan Wang commented on THRIFT-5814:
-------------------------------------

So far I tried a few ways, none of them actually fixes the flakiness:

* Replace the tcp connection with a unix domain socket
* After the client established the connection and sleep for a small period of 
time, do a connectivity check to make sure the connection is still good, and if 
not, retry establishing a client connection again
* Disable tcp keep-alive
* Change tcp keep-alive to a much shorter interval


> go: Flaky test TestNoHangDuringStopFromClientNoDataSendDuringAcceptLoop
> -----------------------------------------------------------------------
>
>                 Key: THRIFT-5814
>                 URL: https://issues.apache.org/jira/browse/THRIFT-5814
>             Project: Thrift
>          Issue Type: Task
>          Components: Go - Library
>    Affects Versions: 0.20.0
>            Reporter: Yuxuan Wang
>            Priority: Minor
>
> Currently the 
> [TestNoHangDuringStopFromClientNoDataSendDuringAcceptLoop|https://github.com/apache/thrift/blob/cb9ceada554f47aa5ebbedfe3984de0983cf0226/lib/go/thrift/simple_server_test.go#L164]
>  test in go library can be flaky (fails at roughly 1-in-100 chance)
> What this test does is roughly:
> # Create a local server listening on a random local port (via localhost:0)
> # Create a tcp client that connects to the server (via net.Dial) but does 
> nothing after established the connection (so to server's PoV this is an idle 
> client)
> # Tries to shutdown the server
> # Verifies that the shutting down of the server took at least the configured 
> timeout, before server forcefully close idle client connections
> Step 4 can occasionally (rarely) fail because the server shutdown much faster 
> than expected. I did some digging, the reason seems to be that the 
> client-server tcp connection is broken after established (killed by the os or 
> something?)
> So we need to find a way to keep the connection until server kills it to fix 
> the flakiness of this test



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to