On Mon, Oct 30, 2017 at 11:17 PM, Alexei Starovoitov
<alexei.starovoi...@gmail.com> wrote:
>
> On Mon, Oct 30, 2017 at 11:08:20PM -0700, Eric Dumazet wrote:
> > From: Eric Dumazet <eduma...@google.com>
> >
> > Based on SNMP values provided by Roman, Yuchung made the observation
> > that some crashes in tcp_sacktag_walk() might be caused by MTU probing.
> >
> > Looking at tcp_mtu_probe(), I found that when a new skb was placed
> > in front of the write queue, we were not updating tcp highest sack.
> >
> > If one skb is freed because all its content was copied to the new skb
> > (for MTU probing), then tp->highest_sack could point to a now freed skb.
> >
> > Bad things would then happen, including infinite loops.
> >
> > This patch renames tcp_highest_sack_combine() and uses it
> > from tcp_mtu_probe() to fix the bug.
> >
> > Note that I also removed one test against tp->sacked_out,
> > since we want to replace tp->highest_sack regardless of whatever
> > condition, since keeping a stale pointer to freed skb is a recipe
> > for disaster.
> >
> > Fixes: a47e5a988a57 ("[TCP]: Convert highest_sack to sk_buff to allow 
> > direct access")
> > Signed-off-by: Eric Dumazet <eduma...@google.com>
> > Reported-by: Alexei Starovoitov <alexei.starovoi...@gmail.com>
> > Reported-by: Roman Gushchin <g...@fb.com>
> > Reported-by: Oleksandr Natalenko <oleksa...@natalenko.name>
>
> Thanks!
>
> Acked-by: Alexei Starovoitov <a...@kernel.org>
>
> wow. a bug from 2007.
> Any idea why it only started to bite us in 4.11 ?
FWIW some random guess:
Since RACK was confirmed to trigger the issue, and RACK enables
detecting lost retransmission w/o limited-transmit in CA_Loss state, I
guess RACK create a new type of "fast retransmit" that caused some
previously impossible SACK during MTU probing.

Acked-by: Yuchung Cheng <ych...@google.com>


>
> It's not trivial for us to reproduce it, but we will definitely
> test the patch as soon as we can.
> Do you have packet drill test or something for easy repro?
>

Reply via email to