[
https://issues.apache.org/jira/browse/KAFKA-9648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Jacot resolved KAFKA-9648.
--------------------------------
Fix Version/s: 3.2.0
Reviewer: David Jacot
Resolution: Fixed
> Add configuration to adjust listen backlog size for Acceptor
> ------------------------------------------------------------
>
> Key: KAFKA-9648
> URL: https://issues.apache.org/jira/browse/KAFKA-9648
> Project: Kafka
> Issue Type: Improvement
> Components: core
> Affects Versions: 0.10.0.1
> Reporter: li xiangyuan
> Assignee: Haruki Okada
> Priority: Minor
> Fix For: 3.2.0
>
>
> I have describe a mystery problem
> (https://issues.apache.org/jira/browse/KAFKA-9211). This issue I found kafka
> server will trigger tcp Congestion Control in some condition. finally we
> found the root cause.
> when kafka server restart for any reason and then execute preferred replica
> leader, lots of replica leader will give back to it & trigger cluster
> metadata update. then all clients will establish connection to this server.
> at the monment many tcp estable request are waiting in the tcp sync queue ,
> and then to accept queue.
> kafka create serversocket in SocketServer.scala
>
> {code:java}
> serverChannel.socket.bind(socketAddress);{code}
> this method has second parameter "backlog", min(backlog,tcp_max_syn_backlog)
> will decide the queue length.beacues kafka haven't set ,it is default value
> 50.
> if this queue is full, and tcp_syncookies = 0, then new connection request
> will be rejected. If tcp_syncookies=1, it will trigger the tcp synccookie
> mechanism. this mechanism could allow linux handle more tcp sync request, but
> it would lose many tcp external parameter, include "wscale", the one that
> allow tcp connection to send much more bytes per tcp package. because
> syncookie triggerd, wscale has lost, and this tcp connection will handle
> network very slow, forever,until this connection is closed and establish
> another tcp connection.
> so after a preferred repilca executed, lots of new tcp connection will
> establish without set wscale,and many network traffic to this server will
> have a very slow speed.
> i'm not sure whether new linux version have resolved this problem, but kafka
> also should set backlog a larger value. we now have modify this to 512, seems
> everything is ok.
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)