On 06/17/16 06:53, Gleb Smirnoff wrote:
Hi!
At Netflix we are observing a race in TCP timers with head.
The problem is a regression, that doesn't happen on stable/10.
The panic usually happens after several hours at 55 Gbit/s of
traffic.
What happens is that tcp_timer_keep finds t_tcpcb being
NULL. Some coredumps have tcpcb already initialized,
with non-NULL t_tcpcb and in TCPS_ESTABLISHED state. Which
means that other CPU was working on the tcpcb while
the faulted one was working on the panic. So, this all looks
like a use after free, which conflicts with new allocation.
Comparing stable/10 and head, I see two changes that could
affect that:
- callout_async_drain
- switch to READ lock for inp info in tcp timers
That's why you are in To, Julien and Hans :)
We continue investigating, and I will keep you updated.
However, any help is welcome. I can share cores.
Hi,
I do have projects/hps_head around, which is not that much behind
11-current, which has a completely different callout implementation. If
you can reproduce the issue separately might we worth a try to rule out
the callout stack.
--HPS
_______________________________________________
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"