li xiangyuan created KAFKA-9648:
-----------------------------------

             Summary: kafka server should resize backlog when create 
serversocket
                 Key: KAFKA-9648
                 URL: https://issues.apache.org/jira/browse/KAFKA-9648
             Project: Kafka
          Issue Type: Improvement
          Components: core
    Affects Versions: 0.10.0.1
            Reporter: li xiangyuan


I have describe a mystery problem 
(https://issues.apache.org/jira/browse/KAFKA-9211). This issue I found kafka 
server will trigger tcp Congestion Control in some condition. finally we found 
the root cause.

when kafka server restart for any reason and then execute preferred replica 
leader, lots of replica leader will give back to it & trigger cluster metadata 
update. then all clients will establish connection to this server. at the 
monment many tcp estable request are waiting in the tcp sync queue , and then 
to accept queue. 

kafka create serversocket in SocketServer.scala 

 
{code:java}
serverChannel.socket.bind(socketAddress);{code}
this method has second parameter "backlog", min(backlog,tcp_max_syn_backlog) 
will decide the queue length.beacues kafka haven't set ,it is default value 50.

if this queue is full, and tcp_syncookies = 0, then new connection request will 
be rejected. If tcp_syncookies=1, it will trigger the tcp synccookie mechanism. 
this mechanism could allow linux handle more tcp sync request, but it would 
lose many tcp external parameter, include "wscale", the one that allow tcp 
connection to send much more bytes per tcp package. because syncookie triggerd, 
wscale has lost, and this tcp connection will handle network very slow, 
forever,until this connection is closed and establish another tcp connection.

so after a preferred repilca executed, lots of new tcp connection will 
establish without set wscale,and many network traffic to this server will have 
a very slow speed.

i'm not sure whether new linux version have resolved this problem, but kafka 
also should set backlog a larger value. we now have modify this to 512, seems 
everything is ok.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to