On Sun, Sep 17, 2017 at 11:43 AM, Oleksandr Natalenko
<oleksa...@natalenko.name> wrote:
> Hi.
>
> Just to note that it looks like disabling RACK and re-enabling FACK prevents
> warning from happening:
>
> net.ipv4.tcp_fack = 1
> net.ipv4.tcp_recovery = 0
>
> Hope I get semantics of these tunables right.
Thanks.

One difference between RACK and FACK is that RACK can detect lost
retransmission in CA_Recovery (fast recovery) and CA_Loss  (post RTO)
mode, while the current FACK can not. A previous FACK version can also
detect lost retransmission in CA_recovery with limited-transmit. I
suspect it is RACK's special ability that triggers this warning.

IMO, however, this warning itself is questionably valid: with undo
(TCP Eifel), the sender can detect and revert a false CA_Recovery /
CA_Loss to CA_Open, with spurious retransmission in-flight
(tp->retrans_out > 0). Then another SACK after undo triggers this
warning. Neal and I are not sure if this is causing the panics you're
seeing, but personally I'd argue this warning is false, or at least
should be revised to skip undo case.


>
> On pátek 15. září 2017 21:04:36 CEST Oleksandr Natalenko wrote:
>> Hello.
>>
>> With net.ipv4.tcp_fack set to 0 the warning still appears:
>>
>> ===
>> » sysctl net.ipv4.tcp_fack
>> net.ipv4.tcp_fack = 0
>>
>> » LC_TIME=C dmesg -T | grep WARNING
>> [Fri Sep 15 20:40:30 2017] WARNING: CPU: 1 PID: 711 at net/ipv4/tcp_input.c:
>> 2826 tcp_fastretrans_alert+0x7c8/0x990
>> [Fri Sep 15 20:40:30 2017] WARNING: CPU: 0 PID: 711 at net/ipv4/tcp_input.c:
>> 2826 tcp_fastretrans_alert+0x7c8/0x990
>> [Fri Sep 15 20:48:37 2017] WARNING: CPU: 1 PID: 711 at net/ipv4/tcp_input.c:
>> 2826 tcp_fastretrans_alert+0x7c8/0x990
>> [Fri Sep 15 20:48:55 2017] WARNING: CPU: 0 PID: 711 at net/ipv4/tcp_input.c:
>> 2826 tcp_fastretrans_alert+0x7c8/0x990
>>
>> » ps -up 711
>> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
>> root       711  4.3  0.0      0     0 ?        S    18:12   7:23 [irq/123-
>> enp3s0]
>> ===
>>
>> Any suggestions?
>>
>> On pátek 15. září 2017 16:03:00 CEST Neal Cardwell wrote:
>> > Thanks for testing that. That is a very useful data point.
>> >
>> > I was able to cook up a packetdrill test that could put the connection
>> > in CA_Disorder with retransmitted packets out, but not in CA_Open. So
>> > we do not yet have a test case to reproduce this.
>> >
>> > We do not see this warning on our fleet at Google. One significant
>> > difference I see between our environment and yours is that it seems
>> >
>> > you run with FACK enabled:
>> >   net.ipv4.tcp_fack = 1
>> >
>> > Note that FACK was disabled by default (since it was replaced by RACK)
>> > between kernel v4.10 and v4.11. And this is exactly the time when this
>> > bug started manifesting itself for you and some others, but not our
>> > fleet. So my new working hypothesis would be that this warning is due
>> > to a behavior that only shows up in kernels >=4.11 when FACK is
>> > enabled.
>> >
>> > Would you be able to disable FACK ("sysctl net.ipv4.tcp_fack=0" at
>> > boot, or net.ipv4.tcp_fack=0 in /etc/sysctl.conf, or equivalent),
>> > reboot, and test the kernel for a few days to see if the warning still
>> > pops up?
>> >
>> > thanks,
>> > neal
>> >
>> > [ps: apologies for the previous, mis-formatted post...]
>
>

Reply via email to