On 03/09/2018 06:11 AM, Ilpo Järvinen wrote:
On Wed, 7 Mar 2018, Yuchung Cheng wrote:

On Wed, Mar 7, 2018 at 12:19 PM, Neal Cardwell <ncardw...@google.com> wrote:

On Wed, Mar 7, 2018 at 7:59 AM, Ilpo Järvinen <ilpo.jarvi...@helsinki.fi> wrote:
A bogus undo may/will trigger when the loss recovery state is
kept until snd_una is above high_seq. If tcp_any_retrans_done
is zero, retrans_stamp is cleared in this transient state. On
the next ACK, tcp_try_undo_recovery again executes and
tcp_may_undo will always return true because tcp_packet_delayed
has this condition:
     return !tp->retrans_stamp || ...

Check for the false fast retransmit transient condition in
tcp_packet_delayed to avoid bogus undos. Since snd_una may have
advanced on this ACK but CA state still remains unchanged,
prior_snd_una needs to be passed instead of tp->snd_una.

This one also seems like a case where it would be nice to have a
specific packet-by-packet example, or trace, or packetdrill scenario.
Something that we might be able to translate into a test, or at least
to document the issue more explicitly.

I am hesitate for further logic to make undo "perfect" on non-sack
cases b/c undo is very complicated and SACK is extremely
well-supported today. so a trace to demonstrate how severe this issue
is appreciated.

This is not just some remote corner cases to which I perhaps would
understand your "making undo perfect" comment. Those undos result in
a burst that, at worst, triggers additional buffer overflow because the
correct CC action is cancelled. Unfortunately I don't have now permission
to publish the time-seq graph about it but I've tried to improve the
changelog messages so that you can better understand under which
conditions the problem occurs.

SACK case remains the same even after this change. I did rework the
logic a bit though (pass prior_snd_una rather than flag around) to
make it more obvious but the change is unfortunately lengthy (no matter
what I pass through the call-chain).



Can you provide packetdrill test(s) then if you can not provide traces ?

If we can not have tests, we will likely have future regressions, since
most developments are using SACK in mind.

Fact that these bugs have been unnoticed for years is concerning.

We have the goal of updating packetdrill and publish soon our regression suite (about 1500 TCP tests)

CC Willem who is working on that.


Reply via email to