[jira] [Commented] (KAFKA-9648) kafka server should resize backlog when create serversocket

2021-07-22 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17385313#comment-17385313
 ] 

Haruki Okada commented on KAFKA-9648:
-

[~flashmouse] Hi, I proposed a new KIP regarding this.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-764%3A+Configurable+backlog+size+for+creating+Acceptor

Please take a look if you're interested !

> kafka server should resize backlog when create serversocket
> ---
>
> Key: KAFKA-9648
> URL: https://issues.apache.org/jira/browse/KAFKA-9648
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.0.1
>Reporter: li xiangyuan
>Priority: Minor
>
> I have describe a mystery problem 
> (https://issues.apache.org/jira/browse/KAFKA-9211). This issue I found kafka 
> server will trigger tcp Congestion Control in some condition. finally we 
> found the root cause.
> when kafka server restart for any reason and then execute preferred replica 
> leader, lots of replica leader will give back to it & trigger cluster 
> metadata update. then all clients will establish connection to this server. 
> at the monment many tcp estable request are waiting in the tcp sync queue , 
> and then to accept queue. 
> kafka create serversocket in SocketServer.scala 
>  
> {code:java}
> serverChannel.socket.bind(socketAddress);{code}
> this method has second parameter "backlog", min(backlog,tcp_max_syn_backlog) 
> will decide the queue length.beacues kafka haven't set ,it is default value 
> 50.
> if this queue is full, and tcp_syncookies = 0, then new connection request 
> will be rejected. If tcp_syncookies=1, it will trigger the tcp synccookie 
> mechanism. this mechanism could allow linux handle more tcp sync request, but 
> it would lose many tcp external parameter, include "wscale", the one that 
> allow tcp connection to send much more bytes per tcp package. because 
> syncookie triggerd, wscale has lost, and this tcp connection will handle 
> network very slow, forever,until this connection is closed and establish 
> another tcp connection.
> so after a preferred repilca executed, lots of new tcp connection will 
> establish without set wscale,and many network traffic to this server will 
> have a very slow speed.
> i'm not sure whether new linux version have resolved this problem, but kafka 
> also should set backlog a larger value. we now have modify this to 512, seems 
> everything is ok.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9648) kafka server should resize backlog when create serversocket

2021-07-12 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17379515#comment-17379515
 ] 

Haruki Okada commented on KAFKA-9648:
-

Hi.


 We operate a Kafka cluster in our company which has 130+ brokers, 1.3M+ total 
socket server connections and 25K+ partitions.
 We faced very similar issue to 
https://issues.apache.org/jira/browse/KAFKA-9211 (i.e. producer slowing down 
mysteriously with small TCP packets) recently, and we figured out the cause.
 We hope our knowledge could help.
h2. Environment:
 * Kafka broker version: 2.4.1 (but we suppose the version doesn't matter)
 * Kafka broker OS: CentOS7 (kernel 3.10.X)

h2. Phenomenon:
 * Restart a broker process, and execute preferred leader election after the 
broker became in-sync
 * Some producers's node-request-latency to the broker got insanely higher than 
usual
 ** However no such high produce response latency is observed on broker-side 
metrics
 * As the result, producer batches couldn't be sent out in sufficient pace, 
then caused batch expiration

h2. Analysis:
 * We observed TCP SYN cookies metric was increased at incidental time, with 
following dmesg message:
 ** 
{code:java}
TCP: request_sock_TCP: Possible SYN flooding on port 22991. Sending cookies.  
Check SNMP counters.{code}

 * So we also suspected the phenomenon is due to `wscale` drop (as like 
described in this issue), but we doubt it because:
 ** Even with TCP SYN cookies, `wscale` should be available as long as TCP 
timestamp is enabled. (refs: 
[https://blog.cloudflare.com/syn-packet-handling-in-the-wild/])
 ** In our environment, TCP timestamp is enabled.
 ** Also, generally, `wscale` (window scaling factor) is used for extending 
window beyond 65535 (max window size in TCP spec) on large-network round trip 
environment such as internet
 *** Our typical produce request size is smaller than that
 *** So it's hard to imagine that `wscale` drop causes such significant 
request-latency degradation
 * After several attempts to reproduce, we found out that receiver (i.e. broker 
in this context)'s `wscale` is inconsistent between producer and broker at 
incidental time.
 ** receiver's `wscale` advertised from broker -> producer along with SYN+ACK 
packet: 7
 *** 
{code:java}
17:32:05.161213 IP BROKER.HOST. > CLIENT.: Flags [S.], seq 29292755, 
ack 17469019, win 28960, options [mss 1460,sackOK,TS val 25589601 ecr 
9055245,nop,wscale 1], length 0{code}

 *** (seq numbers are substituted with random value)
 ** actual receiver's `wscale` after established: 1
 *** 
{code:java}
[user@BROKER ~]$ /sbin/ss -e -i -t | less
...
ESTAB  0  0  BROKER.                CLIENT.                 
timer:(keepalive,21min,0) uid:503 ino:15143860 sk:9ba25dc4f440 <-> 
ts sack cubic wscale:7,7 rto:201 rtt:0.179/0.006 ato:140 mss:1448 rcvmss:1448 
advmss:1448 cwnd:10 bytes_acked:97054 bytes_received:18950537 segs_out:15094 
segs_in:13941 send 647.2Mbps lastsnd:217 lastrcv:17 lastack:217 pacing_rate 
1288.0Mbps rcv_rtt:1.875 rcv_space:29200{code}

 *** `wscale:7,7` means that broker's receiver window scaling factor is 7
 * Okay, then this inconsistency could explain the phenomenon as below:
 ** Premise: When `wscale` is enabled, TCP window size is calculated by 
`window_size * 2^wscale`
 ** When broker calculates advertised window size, it's bit-shifted to the 
right by `wscale` (== 7)
 *** [https://github.com/torvalds/linux/blob/v3.10/net/ipv4/tcp_output.c#L290]
 ** On the other hand, producer multiplies advertised window size by 1, which 
is conveyed through SYN+ACK
 ** As the result, window size became 64 times smaller than expected
 ** Then producer splits TCP packets to much smaller size than usual (possibly 
under MSS)
 *** TCP acks are delayed due to the conditions are not met 
([https://github.com/torvalds/linux/blob/v3.10/net/ipv4/tcp_input.c#L4760])
 * Last remaining question is "why such wscale inconsistency happened?"
 ** Read through the kernel source code, then we found that there's an issue in 
the logic to calculate wscale if TCP connection is established through 
syncookies
 *** It's fixed in this commit: 
[https://github.com/torvalds/linux/commit/909172a149749242990a6e64cb55d55460d4e417]

h2. Solution:
 * 1. Upgrade linux kernel to at least v5.10 (which the patch is committed)
 * 2. Disable SYN cookies
 ** i.e. setting `net.ipv4.tcp_syncookies` kernel parameter to 0
 ** With SYN cookies disabled, some SYN's are dropped on same situations but 
typically it's not be a serious problem thanks to SYN retries. Clients should 
reconnect soon.
 * 3. Adjust backlog size (as this ticket suggests)
 ** Even disabling SYN cookies work thanks to SYN retries, it will cause 
certain delay in TCP establishment if reconnect happened.
 ** So generally it's preferable to adjust backlog size as well, to avoid SYN 
drops in the first place.

 

I think we should add new config 

[jira] [Commented] (KAFKA-9648) kafka server should resize backlog when create serversocket

2020-03-06 Thread li xiangyuan (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053195#comment-17053195
 ] 

li xiangyuan commented on KAFKA-9648:
-

[~junrao]: Yes 1412 is quite small, I have noticed that but couldn't understand 
why. In tcpdump file it shows after connection established, server's windows 
size suddenly drop down from x to 394, and producer's window size keep 
31088.  I think it's beyond my ability & knowledge to explain the reason ...

> kafka server should resize backlog when create serversocket
> ---
>
> Key: KAFKA-9648
> URL: https://issues.apache.org/jira/browse/KAFKA-9648
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.0.1
>Reporter: li xiangyuan
>Priority: Minor
>
> I have describe a mystery problem 
> (https://issues.apache.org/jira/browse/KAFKA-9211). This issue I found kafka 
> server will trigger tcp Congestion Control in some condition. finally we 
> found the root cause.
> when kafka server restart for any reason and then execute preferred replica 
> leader, lots of replica leader will give back to it & trigger cluster 
> metadata update. then all clients will establish connection to this server. 
> at the monment many tcp estable request are waiting in the tcp sync queue , 
> and then to accept queue. 
> kafka create serversocket in SocketServer.scala 
>  
> {code:java}
> serverChannel.socket.bind(socketAddress);{code}
> this method has second parameter "backlog", min(backlog,tcp_max_syn_backlog) 
> will decide the queue length.beacues kafka haven't set ,it is default value 
> 50.
> if this queue is full, and tcp_syncookies = 0, then new connection request 
> will be rejected. If tcp_syncookies=1, it will trigger the tcp synccookie 
> mechanism. this mechanism could allow linux handle more tcp sync request, but 
> it would lose many tcp external parameter, include "wscale", the one that 
> allow tcp connection to send much more bytes per tcp package. because 
> syncookie triggerd, wscale has lost, and this tcp connection will handle 
> network very slow, forever,until this connection is closed and establish 
> another tcp connection.
> so after a preferred repilca executed, lots of new tcp connection will 
> establish without set wscale,and many network traffic to this server will 
> have a very slow speed.
> i'm not sure whether new linux version have resolved this problem, but kafka 
> also should set backlog a larger value. we now have modify this to 512, seems 
> everything is ok.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9648) kafka server should resize backlog when create serversocket

2020-03-05 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17052419#comment-17052419
 ] 

Jun Rao commented on KAFKA-9648:


[~flashmouse] : Thanks for the reply. Just to clarify. TCP window scaling 
allows receiving window size more than 64KB. But it seems that the 1412 bytes 
used by the producer is much smaller than 64KB. So, even with window scaling 
turned off, it seems that the producer should be able to send more bytes before 
being ack-ed?

> kafka server should resize backlog when create serversocket
> ---
>
> Key: KAFKA-9648
> URL: https://issues.apache.org/jira/browse/KAFKA-9648
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.0.1
>Reporter: li xiangyuan
>Priority: Minor
>
> I have describe a mystery problem 
> (https://issues.apache.org/jira/browse/KAFKA-9211). This issue I found kafka 
> server will trigger tcp Congestion Control in some condition. finally we 
> found the root cause.
> when kafka server restart for any reason and then execute preferred replica 
> leader, lots of replica leader will give back to it & trigger cluster 
> metadata update. then all clients will establish connection to this server. 
> at the monment many tcp estable request are waiting in the tcp sync queue , 
> and then to accept queue. 
> kafka create serversocket in SocketServer.scala 
>  
> {code:java}
> serverChannel.socket.bind(socketAddress);{code}
> this method has second parameter "backlog", min(backlog,tcp_max_syn_backlog) 
> will decide the queue length.beacues kafka haven't set ,it is default value 
> 50.
> if this queue is full, and tcp_syncookies = 0, then new connection request 
> will be rejected. If tcp_syncookies=1, it will trigger the tcp synccookie 
> mechanism. this mechanism could allow linux handle more tcp sync request, but 
> it would lose many tcp external parameter, include "wscale", the one that 
> allow tcp connection to send much more bytes per tcp package. because 
> syncookie triggerd, wscale has lost, and this tcp connection will handle 
> network very slow, forever,until this connection is closed and establish 
> another tcp connection.
> so after a preferred repilca executed, lots of new tcp connection will 
> establish without set wscale,and many network traffic to this server will 
> have a very slow speed.
> i'm not sure whether new linux version have resolved this problem, but kafka 
> also should set backlog a larger value. we now have modify this to 512, seems 
> everything is ok.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9648) kafka server should resize backlog when create serversocket

2020-03-04 Thread li xiangyuan (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051780#comment-17051780
 ] 

li xiangyuan commented on KAFKA-9648:
-

[~junrao]:  all our kafka server have the same tcp configuration below, left 
are default value:
{code:java}
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
{code}
I have upload tcpdump file (https://issues.apache.org/jira/browse/KAFKA-9211) 
when this problem occur. it mentioned that every producer tcp packet size are 
small than 1412 bytes and producer won't send next tcp package until get 
server's tcp ack that always delay 40ms(sometimes 200ms). it means  a kafka 
produce request larger than 1412 bytes will split to 2 tcp packages and need at 
least 40ms to complete send all applcation request to kafka server, the request 
larger , it cost multi 40ms to complete.

 

[~ijuma] glad do it! just change 
{code:java}
serverChannel.socket.bind(socketAddress);
{code}
to
{code:java}
serverChannel.socket.bind(socketAddress, 512);
{code}
or anything else?

> kafka server should resize backlog when create serversocket
> ---
>
> Key: KAFKA-9648
> URL: https://issues.apache.org/jira/browse/KAFKA-9648
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.0.1
>Reporter: li xiangyuan
>Priority: Minor
>
> I have describe a mystery problem 
> (https://issues.apache.org/jira/browse/KAFKA-9211). This issue I found kafka 
> server will trigger tcp Congestion Control in some condition. finally we 
> found the root cause.
> when kafka server restart for any reason and then execute preferred replica 
> leader, lots of replica leader will give back to it & trigger cluster 
> metadata update. then all clients will establish connection to this server. 
> at the monment many tcp estable request are waiting in the tcp sync queue , 
> and then to accept queue. 
> kafka create serversocket in SocketServer.scala 
>  
> {code:java}
> serverChannel.socket.bind(socketAddress);{code}
> this method has second parameter "backlog", min(backlog,tcp_max_syn_backlog) 
> will decide the queue length.beacues kafka haven't set ,it is default value 
> 50.
> if this queue is full, and tcp_syncookies = 0, then new connection request 
> will be rejected. If tcp_syncookies=1, it will trigger the tcp synccookie 
> mechanism. this mechanism could allow linux handle more tcp sync request, but 
> it would lose many tcp external parameter, include "wscale", the one that 
> allow tcp connection to send much more bytes per tcp package. because 
> syncookie triggerd, wscale has lost, and this tcp connection will handle 
> network very slow, forever,until this connection is closed and establish 
> another tcp connection.
> so after a preferred repilca executed, lots of new tcp connection will 
> establish without set wscale,and many network traffic to this server will 
> have a very slow speed.
> i'm not sure whether new linux version have resolved this problem, but kafka 
> also should set backlog a larger value. we now have modify this to 512, seems 
> everything is ok.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9648) kafka server should resize backlog when create serversocket

2020-03-04 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051699#comment-17051699
 ] 

Jun Rao commented on KAFKA-9648:


[~flashmouse]: Thanks for the analysis. Great finding! The impact seems on the 
producer. Did you send a large tcp socket buffer for the producer? Was the 
network latency between the producer and the broker high?

> kafka server should resize backlog when create serversocket
> ---
>
> Key: KAFKA-9648
> URL: https://issues.apache.org/jira/browse/KAFKA-9648
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.0.1
>Reporter: li xiangyuan
>Priority: Minor
>
> I have describe a mystery problem 
> (https://issues.apache.org/jira/browse/KAFKA-9211). This issue I found kafka 
> server will trigger tcp Congestion Control in some condition. finally we 
> found the root cause.
> when kafka server restart for any reason and then execute preferred replica 
> leader, lots of replica leader will give back to it & trigger cluster 
> metadata update. then all clients will establish connection to this server. 
> at the monment many tcp estable request are waiting in the tcp sync queue , 
> and then to accept queue. 
> kafka create serversocket in SocketServer.scala 
>  
> {code:java}
> serverChannel.socket.bind(socketAddress);{code}
> this method has second parameter "backlog", min(backlog,tcp_max_syn_backlog) 
> will decide the queue length.beacues kafka haven't set ,it is default value 
> 50.
> if this queue is full, and tcp_syncookies = 0, then new connection request 
> will be rejected. If tcp_syncookies=1, it will trigger the tcp synccookie 
> mechanism. this mechanism could allow linux handle more tcp sync request, but 
> it would lose many tcp external parameter, include "wscale", the one that 
> allow tcp connection to send much more bytes per tcp package. because 
> syncookie triggerd, wscale has lost, and this tcp connection will handle 
> network very slow, forever,until this connection is closed and establish 
> another tcp connection.
> so after a preferred repilca executed, lots of new tcp connection will 
> establish without set wscale,and many network traffic to this server will 
> have a very slow speed.
> i'm not sure whether new linux version have resolved this problem, but kafka 
> also should set backlog a larger value. we now have modify this to 512, seems 
> everything is ok.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9648) kafka server should resize backlog when create serversocket

2020-03-04 Thread Ismael Juma (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051211#comment-17051211
 ] 

Ismael Juma commented on KAFKA-9648:


Would you be interested in submitting a pull request?

> kafka server should resize backlog when create serversocket
> ---
>
> Key: KAFKA-9648
> URL: https://issues.apache.org/jira/browse/KAFKA-9648
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.0.1
>Reporter: li xiangyuan
>Priority: Minor
>
> I have describe a mystery problem 
> (https://issues.apache.org/jira/browse/KAFKA-9211). This issue I found kafka 
> server will trigger tcp Congestion Control in some condition. finally we 
> found the root cause.
> when kafka server restart for any reason and then execute preferred replica 
> leader, lots of replica leader will give back to it & trigger cluster 
> metadata update. then all clients will establish connection to this server. 
> at the monment many tcp estable request are waiting in the tcp sync queue , 
> and then to accept queue. 
> kafka create serversocket in SocketServer.scala 
>  
> {code:java}
> serverChannel.socket.bind(socketAddress);{code}
> this method has second parameter "backlog", min(backlog,tcp_max_syn_backlog) 
> will decide the queue length.beacues kafka haven't set ,it is default value 
> 50.
> if this queue is full, and tcp_syncookies = 0, then new connection request 
> will be rejected. If tcp_syncookies=1, it will trigger the tcp synccookie 
> mechanism. this mechanism could allow linux handle more tcp sync request, but 
> it would lose many tcp external parameter, include "wscale", the one that 
> allow tcp connection to send much more bytes per tcp package. because 
> syncookie triggerd, wscale has lost, and this tcp connection will handle 
> network very slow, forever,until this connection is closed and establish 
> another tcp connection.
> so after a preferred repilca executed, lots of new tcp connection will 
> establish without set wscale,and many network traffic to this server will 
> have a very slow speed.
> i'm not sure whether new linux version have resolved this problem, but kafka 
> also should set backlog a larger value. we now have modify this to 512, seems 
> everything is ok.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9648) kafka server should resize backlog when create serversocket

2020-03-04 Thread Ismael Juma (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051209#comment-17051209
 ] 

Ismael Juma commented on KAFKA-9648:


Thanks for the ticket, interesting discovery. cc [~junrao] 
[~rajinisiva...@gmail.com]

> kafka server should resize backlog when create serversocket
> ---
>
> Key: KAFKA-9648
> URL: https://issues.apache.org/jira/browse/KAFKA-9648
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.0.1
>Reporter: li xiangyuan
>Priority: Minor
>
> I have describe a mystery problem 
> (https://issues.apache.org/jira/browse/KAFKA-9211). This issue I found kafka 
> server will trigger tcp Congestion Control in some condition. finally we 
> found the root cause.
> when kafka server restart for any reason and then execute preferred replica 
> leader, lots of replica leader will give back to it & trigger cluster 
> metadata update. then all clients will establish connection to this server. 
> at the monment many tcp estable request are waiting in the tcp sync queue , 
> and then to accept queue. 
> kafka create serversocket in SocketServer.scala 
>  
> {code:java}
> serverChannel.socket.bind(socketAddress);{code}
> this method has second parameter "backlog", min(backlog,tcp_max_syn_backlog) 
> will decide the queue length.beacues kafka haven't set ,it is default value 
> 50.
> if this queue is full, and tcp_syncookies = 0, then new connection request 
> will be rejected. If tcp_syncookies=1, it will trigger the tcp synccookie 
> mechanism. this mechanism could allow linux handle more tcp sync request, but 
> it would lose many tcp external parameter, include "wscale", the one that 
> allow tcp connection to send much more bytes per tcp package. because 
> syncookie triggerd, wscale has lost, and this tcp connection will handle 
> network very slow, forever,until this connection is closed and establish 
> another tcp connection.
> so after a preferred repilca executed, lots of new tcp connection will 
> establish without set wscale,and many network traffic to this server will 
> have a very slow speed.
> i'm not sure whether new linux version have resolved this problem, but kafka 
> also should set backlog a larger value. we now have modify this to 512, seems 
> everything is ok.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)