FYI, babeld seems to be affected by this same bug: https://github.com/jech/babeld/issues/50
The net.ipv6.route.max_size workaround is also mentioned there. Baptiste On 26-02-20, Basil Fillan wrote: > Hi, > > We've also experienced this after upgrading a few routers to Debian Buster. > With a kernel bisect we found that a bug was introduced in the following > commit: > > 3b6761d18bc11f2af2a6fc494e9026d39593f22c > > This bug was still present in master as of a few weeks ago. > > It appears entries are added to the IPv6 route cache which aren't visible > from "ip -6 route show cache", but are causing the route cache garbage > collection system to trigger extremely often (every packet?) once it exceeds > the value of net.ipv6.route.max_size. Our original symptom was extreme > forwarding jitter caused within the ip6_dst_gc function (identified by some > spelunking with systemtap & perf) worsening as the size of the cache > increased. This was due to our max_size sysctl inadvertently being set to 1 > million. Reducing this value to the default 4096 broke IPv6 forwarding > entirely on our test system under affected kernels. Our documentation had > this sysctl marked as the maximum number of IPv6 routes, so it looks like > the use changed at some point. > > We've rolled our routers back to kernel 4.9 (with the sysctl set to 4096) > for now, which fixed our immediate issue. > > You can reproduce this by adding more than 4096 (default value of the > sysctl) routes to the kernel and running "ip route get" for each of them. > Once the route cache is filled, the error "RTNETLINK answers: Network is > unreachable" will be received for each subsequent "ip route get" > incantation, and v6 connectivity will be interrupted. > > Thanks, > > Basil > > > On 26/02/2020 20:38, Clément Guivy wrote: > > Hi, did anyone find a solution or workaround regarding this issue? > > Considering a router use case. > > I have looked at rt6_stats, total route count is around 78k (full view), > > and around 4100 entries in the cache at the moment on my first router > > (forwarding a few Mb/s) and around 2500 entries on my second router > > (forwarding less than 1 Mb/s). > > I have reread the entire thread. At first, Alarig's research seemed to > > lead to a neighbor management problem, my understanding is that route > > cache is something else entirely - or is it related somehow? > > > > > > On 03/12/2019 19:29, Alarig Le Lay wrote: > > > We agree then, and I act as a router on all those machines. > > > > > > Le 3 décembre 2019 19:27:11 GMT+01:00, Vincent Bernat > > > <vinc...@bernat.ch> a écrit : > > > > > > This is the result of PMTUd. But when you are a router, you don't > > > need to do that, so it's mostly a problem for end hosts. > > > > > > On December 3, 2019 7:05:49 PM GMT+01:00, Alarig Le Lay > > > <ala...@swordarmor.fr> wrote: > > > > > > On 03/12/2019 14:16, Vincent Bernat wrote: > > > > > > The information needs to be stored somewhere. > > > > > > > > > Why has it to be stored? It’s not really my problem if > > > someone else has > > > a non-stantard MTU and can’t do TCP-MSS or PMTUd.
signature.asc
Description: PGP signature