[ 
https://issues.apache.org/jira/browse/KAFKA-9648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051211#comment-17051211
 ] 

Ismael Juma commented on KAFKA-9648:
------------------------------------

Would you be interested in submitting a pull request?

> kafka server should resize backlog when create serversocket
> -----------------------------------------------------------
>
>                 Key: KAFKA-9648
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9648
>             Project: Kafka
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 0.10.0.1
>            Reporter: li xiangyuan
>            Priority: Minor
>
> I have describe a mystery problem 
> (https://issues.apache.org/jira/browse/KAFKA-9211). This issue I found kafka 
> server will trigger tcp Congestion Control in some condition. finally we 
> found the root cause.
> when kafka server restart for any reason and then execute preferred replica 
> leader, lots of replica leader will give back to it & trigger cluster 
> metadata update. then all clients will establish connection to this server. 
> at the monment many tcp estable request are waiting in the tcp sync queue , 
> and then to accept queue. 
> kafka create serversocket in SocketServer.scala 
>  
> {code:java}
> serverChannel.socket.bind(socketAddress);{code}
> this method has second parameter "backlog", min(backlog,tcp_max_syn_backlog) 
> will decide the queue length.beacues kafka haven't set ,it is default value 
> 50.
> if this queue is full, and tcp_syncookies = 0, then new connection request 
> will be rejected. If tcp_syncookies=1, it will trigger the tcp synccookie 
> mechanism. this mechanism could allow linux handle more tcp sync request, but 
> it would lose many tcp external parameter, include "wscale", the one that 
> allow tcp connection to send much more bytes per tcp package. because 
> syncookie triggerd, wscale has lost, and this tcp connection will handle 
> network very slow, forever,until this connection is closed and establish 
> another tcp connection.
> so after a preferred repilca executed, lots of new tcp connection will 
> establish without set wscale,and many network traffic to this server will 
> have a very slow speed.
> i'm not sure whether new linux version have resolved this problem, but kafka 
> also should set backlog a larger value. we now have modify this to 512, seems 
> everything is ok.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to