nobodyiam opened a new issue #1744: Client side response time is slower than actual when client side is in tcp delayed ack mode URL: https://github.com/apache/incubator-dubbo/issues/1744 ## Issue Description Dubbo's [netty 3 server implementation](https://github.com/apache/incubator-dubbo/blob/master/dubbo-remoting/dubbo-remoting-netty/src/main/java/com/alibaba/dubbo/remoting/transport/netty/NettyServer.java#L70) does not enable `TCP_NODELAY` option, which causes the server side not responding in time when client side is in delayed ack mode and the response size is less than MSS. However, the [netty 4 server implementation](https://github.com/apache/incubator-dubbo/blob/master/dubbo-remoting/dubbo-remoting-netty4/src/main/java/com/alibaba/dubbo/remoting/transport/netty4/NettyServer.java#L81) does enable this option. ```java bootstrap.group(bossGroup, workerGroup) .channel(NioServerSocketChannel.class) ---> .childOption(ChannelOption.TCP_NODELAY, Boolean.TRUE) .childOption(ChannelOption.SO_REUSEADDR, Boolean.TRUE) .childOption(ChannelOption.ALLOCATOR, PooledByteBufAllocator.DEFAULT) ``` Considering Netty 4 enables this option by default([Enable TCP_NODELAY and SCTP_NODELAY by default](https://github.com/netty/netty/commit/39357f3835f971e6cc1a0e41a805fa1293e7005e)), dubbo's [netty 3 server implementation](https://github.com/apache/incubator-dubbo/blob/master/dubbo-remoting/dubbo-remoting-netty/src/main/java/com/alibaba/dubbo/remoting/transport/netty/NettyServer.java#L70) should also enable this option by default. ## Solution Simply set this option when constructing ServerBootstrap: ```java bootstrap.setOption("child.tcpNoDelay", true); ``` ## Issue Screenshots Here is an actual example captured in our environment, with 10.5.160.181 as the client side and 10.5.169.180 as the server side. ### Case with normal ack The demo server side logic costs 10-11 ms, so normally, the client side response time is around 12 ms. The frame 9 highlighted below is a normal request, whose request id is 0x02. ![image](https://user-images.githubusercontent.com/837658/39660725-87c0676c-5077-11e8-9059-be353ceebf97.png) The frame 11 highlighted below is the response, whose response id is 0x02. ![image](https://user-images.githubusercontent.com/837658/39660733-b84b750c-5077-11e8-9c72-bdb78da0f299.png) The actual response time is 12 ms. Also we can see from the above screenshots that both client side and server side respond `ack` normally. ### Case with delayed ack Now let's take a look when client side is in delayed ack mode. From the screen shots below, there is no single `ack` packets, which means it's in delayed ack mode - `ack` is returned with a data packet. The frame 239 highlighted below is a request, whose request id is 0x7f. We can see the 240 packet responded immediately when 239 is sent to server, considering the server logic costs 10-11 ms, this packet is not the response. ![image](https://user-images.githubusercontent.com/837658/39660790-700dc08c-5078-11e8-9934-61c630420ee0.png) As we can see, the frame 240's response id is 0x7e, which is the response to the previous request. ![image](https://user-images.githubusercontent.com/837658/39660802-02df6a82-5079-11e8-8964-e8c3c5054ea1.png) Then frame 241 is sent, whose request id is 0x80. ![image](https://user-images.githubusercontent.com/837658/39660818-3cb482f6-5079-11e8-8563-c1e1b2480905.png) Until then, the response to request 0x7f is returned (frame 242) ![image](https://user-images.githubusercontent.com/837658/39660824-5a960dc6-5079-11e8-9feb-3f6f154a2b49.png) Request 0x7f was sent in 14:37:53.267, so we know that the response was ready around 14:37:53.277 in the server side. However, since the server side does not enable `TCP_NODELAY`, so according to Nagel algorithm, the response cannot be sent until an `ack` is received or a timeout occurrs(40ms). So the response is hold in server side and is sent when another request is sent from client side, which was 14:37:53.292730. In this case, the client side response time (25ms) is much slower than actual (12ms). Even worse, in this situation, the response time is determined by the interval in which the client side sends the request. We've done some experiments and the results shows in this situation, if client side sends requests in qps 50, then the client side response time is 20ms, because the client side request sending interval is 20ms (1000/50). If client side qps is 40, the the response time is 25ms. If client side qps is 30, then the response time is 33ms. The response time increases when the qps decreases, until the qps is 25. Because when the interval is larger than 40ms(1000/25), tcp will cancel the delay ack mode. So I would hope this quick fix can be applied so that we could always expect a stable performance. BTW, I could submit a PR if necessary.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: notifications-unsubscr...@dubbo.apache.org For additional commands, e-mail: notifications-h...@dubbo.apache.org