On Sun, Sep 17, 2017 at 11:43 AM, Oleksandr Natalenko <oleksa...@natalenko.name> wrote: > Hi. > > Just to note that it looks like disabling RACK and re-enabling FACK prevents > warning from happening: > > net.ipv4.tcp_fack = 1 > net.ipv4.tcp_recovery = 0 > > Hope I get semantics of these tunables right. Thanks.
One difference between RACK and FACK is that RACK can detect lost retransmission in CA_Recovery (fast recovery) and CA_Loss (post RTO) mode, while the current FACK can not. A previous FACK version can also detect lost retransmission in CA_recovery with limited-transmit. I suspect it is RACK's special ability that triggers this warning. IMO, however, this warning itself is questionably valid: with undo (TCP Eifel), the sender can detect and revert a false CA_Recovery / CA_Loss to CA_Open, with spurious retransmission in-flight (tp->retrans_out > 0). Then another SACK after undo triggers this warning. Neal and I are not sure if this is causing the panics you're seeing, but personally I'd argue this warning is false, or at least should be revised to skip undo case. > > On pátek 15. září 2017 21:04:36 CEST Oleksandr Natalenko wrote: >> Hello. >> >> With net.ipv4.tcp_fack set to 0 the warning still appears: >> >> === >> » sysctl net.ipv4.tcp_fack >> net.ipv4.tcp_fack = 0 >> >> » LC_TIME=C dmesg -T | grep WARNING >> [Fri Sep 15 20:40:30 2017] WARNING: CPU: 1 PID: 711 at net/ipv4/tcp_input.c: >> 2826 tcp_fastretrans_alert+0x7c8/0x990 >> [Fri Sep 15 20:40:30 2017] WARNING: CPU: 0 PID: 711 at net/ipv4/tcp_input.c: >> 2826 tcp_fastretrans_alert+0x7c8/0x990 >> [Fri Sep 15 20:48:37 2017] WARNING: CPU: 1 PID: 711 at net/ipv4/tcp_input.c: >> 2826 tcp_fastretrans_alert+0x7c8/0x990 >> [Fri Sep 15 20:48:55 2017] WARNING: CPU: 0 PID: 711 at net/ipv4/tcp_input.c: >> 2826 tcp_fastretrans_alert+0x7c8/0x990 >> >> » ps -up 711 >> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND >> root 711 4.3 0.0 0 0 ? S 18:12 7:23 [irq/123- >> enp3s0] >> === >> >> Any suggestions? >> >> On pátek 15. září 2017 16:03:00 CEST Neal Cardwell wrote: >> > Thanks for testing that. That is a very useful data point. >> > >> > I was able to cook up a packetdrill test that could put the connection >> > in CA_Disorder with retransmitted packets out, but not in CA_Open. So >> > we do not yet have a test case to reproduce this. >> > >> > We do not see this warning on our fleet at Google. One significant >> > difference I see between our environment and yours is that it seems >> > >> > you run with FACK enabled: >> > net.ipv4.tcp_fack = 1 >> > >> > Note that FACK was disabled by default (since it was replaced by RACK) >> > between kernel v4.10 and v4.11. And this is exactly the time when this >> > bug started manifesting itself for you and some others, but not our >> > fleet. So my new working hypothesis would be that this warning is due >> > to a behavior that only shows up in kernels >=4.11 when FACK is >> > enabled. >> > >> > Would you be able to disable FACK ("sysctl net.ipv4.tcp_fack=0" at >> > boot, or net.ipv4.tcp_fack=0 in /etc/sysctl.conf, or equivalent), >> > reboot, and test the kernel for a few days to see if the warning still >> > pops up? >> > >> > thanks, >> > neal >> > >> > [ps: apologies for the previous, mis-formatted post...] > >