Re: IPv6 BGP & kernel 4.19 (and upto 5.10.46)
Hey Oliver, Oliver writes: > [...] > Why is still the default value of net.ipv6.route.max_size still 4096? > Compared to IPv4 value: > net.ipv4.route.max_size = 2147483647 > > Has someone done more research on this topic? I believe this is a question that should be asked on the LKML or linux-net mailing list - it's very valid and I'd be in favor for aligning it with the IPv4 value. Cheers, Nico -- Sustainable and modern Infrastructures by ungleich.ch
Re: IPv6 BGP & kernel 4.19 (and upto 5.10.46)
Hello, back again on this topic. This problem is still not completely fixed with Debian Bullseye and kernel 5.10.46. The workaround is still: net.ipv6.route.max_size = 40 net.ipv6.route.gc_thresh = 102400 On https://bird.network.cz/pipermail/bird-users/2020-March/014406.html is mentioned that you can also set: net.ipv6.route.gc_thresh = -1 But is this value save to use? This is also the default for IPv4: net.ipv4.route.gc_thresh = -1 With the default value of net.ipv6.route.gc_thresh = 1024 we have still much jitter on the line. Why is still the default value of net.ipv6.route.max_size still 4096? Compared to IPv4 value: net.ipv4.route.max_size = 2147483647 Has someone done more research on this topic? Best regards, Oliver smime.p7s Description: S/MIME cryptographic signature
Re: IPv6 BGP & kernel 4.19
On Thu, 24 Sep 2020, Clément Guivy wrote: > On 24/09/2020 14:37, Oliver wrote: > > Hello, > > > > after upgrading to debian buster with kernel 4.19 we also had problems. > > How filled is your route cache compared to the sysctl treshold? See the > (hex) value with : > cut -d'' -f 6 /proc/net/rt6_stats awk '{ print ("0x"$6)+0, "(" $6 ")" }' /proc/net/rt6_stats It is between 1315 (0523) and 1738 (06ca) > Do you get a "Network is unreachable" error at some point if you do an "ip > route get" on each prefix of your (ipv6) routing table? (while you are doing > this test you should see the cache being filled according to the rt6_stats > file as said before) After we set "net.ipv6.route.max_size = 40" we do not get any "Network is unreachable" anymore. This is how we tested it: ip -6 route |egrep "^[0-9a-f]{1,4}:"|awk '{ print $1; }'|sed "s#/.*##"|xargs -L 1 ip -6 route get 1> /dev/null > How filled is your neighbor table compared to the sysctl treshold? You can > read it with : > ip -6 neigh sh | wc -l 18 (so very low) > Do you notice random drops on Bird sessions? After we set "net.ipv6.route.max_size = 40" we do not have any drops anymore. We have many ipv6_routes: cat /proc/net/ipv6_route | wc -l 207281 (so more then the normal full IPv6 BGP table which is around 9) At the moment just the "jitter" is the problem we have. I just increased net.ipv6.route.gc_thresh to 102400 as suggested from micah. (1/4 of ipv6.route.max_size) With the increased value /proc/net/rt6_stats is going up to 2186 (088a) and stayes in that region. So for the past minutes with this config everything runs smoothly: net.ipv6.route.max_size = 40 net.ipv6.route.gc_thresh = 102400 I did not changed the net.ipv6.neigh.default.gc_thresh* values. I will monitor the values and write again after some time. Oliver
Re: IPv6 BGP & kernel 4.19
On 24/09/2020 14:37, Oliver wrote: Hello, after upgrading to debian buster with kernel 4.19 we also had problems. How filled is your route cache compared to the sysctl treshold? See the (hex) value with : cut -d'' -f 6 /proc/net/rt6_stats Do you get a "Network is unreachable" error at some point if you do an "ip route get" on each prefix of your (ipv6) routing table? (while you are doing this test you should see the cache being filled according to the rt6_stats file as said before) How filled is your neighbor table compared to the sysctl treshold? You can read it with : ip -6 neigh sh | wc -l Do you notice random drops on Bird sessions?
Re: IPv6 BGP & kernel 4.19
Oliver writes: > Hello, > > after upgrading to debian buster with kernel 4.19 we also had problems. > > By adjusting net.ipv6.route.max_size we have fixed the following messages: > watchdog: BUG: soft lockup - CPU#X stuck for 22s! > and > ixgbe :02:00.0 ens2fX: initiating reset due to tx timeout > > But we still had a lot of jitter on the line. Downgrading to 4.9.0 fixed the > problem, but this is not a permanent solution. > > What else did we tried: > * Increasing gc_threshX > net.ipv6.neigh.default.gc_thresh1 = 2048 > net.ipv6.neigh.default.gc_thresh2 = 4096 > net.ipv6.neigh.default.gc_thresh3 = 8192 > => Did not help The linux kernel is getting rid of ipv6 caching, like it did with ipv4, but it will take some time to get there. It seems that in this kernel they have set a small value for net.ipv6.route.max_size (4096!), and when this parameter is increased (e.g. 1048576) the problem went away for us. I'm not 100% clear on what units this value is, I had around 89k ipv6 routes, so this value is definitely higher. I'm sure that setting t too high could result in some memory issues. Additionally, you also want to raise net.ipv6.route.gc_thresh to avoid running the garbage collector too often. I found that the rule of thumb here is 1/4 the size of ipv6.route.max_size. I did find that in Linux kernel 5.2 there is a message output to the kernel ring buffer when the ipv6.route.max_size is hit, so you at least have a *clue* what is going on. In 4.19, which is what Debian Buster is, you don't get that clue. -- micah
Re: IPv6 BGP & kernel 4.19
Hello, after upgrading to debian buster with kernel 4.19 we also had problems. By adjusting net.ipv6.route.max_size we have fixed the following messages: watchdog: BUG: soft lockup - CPU#X stuck for 22s! and ixgbe :02:00.0 ens2fX: initiating reset due to tx timeout But we still had a lot of jitter on the line. Downgrading to 4.9.0 fixed the problem, but this is not a permanent solution. What else did we tried: * Increasing gc_threshX net.ipv6.neigh.default.gc_thresh1 = 2048 net.ipv6.neigh.default.gc_thresh2 = 4096 net.ipv6.neigh.default.gc_thresh3 = 8192 => Did not help * Going to a backports kernel (5.7.0) => Did not help @Frederik Kriewitz: What did you do fix that problem? Oliver
Re: IPv6 BGP & kernel 4.19
Thanks. I found a solution which seems to be working so far, with regular Debian 4.19 kernel, on my 2 edge routers. I set both net.ipv6.gc_thresh and max_size to 131072, the reasoning behind that is to have this limit above the number of routes in the full view, so that gc is not triggered too often. Once the full view is loaded I can now perform an 'ip route get' lookup on each and every prefix without getting a "Network is unreachable" error (thanks for the tip Basil), nor face a noticeable service disruption, and IPv6 BGP sessions have also been stable so far (ie for a few days). If anyone reproduces this solution (or found another one) I would be glad to know. On 16/03/2020 12:41, Baptiste Jonglez wrote: FYI, babeld seems to be affected by this same bug: https://github.com/jech/babeld/issues/50 The net.ipv6.route.max_size workaround is also mentioned there. Baptiste On 26-02-20, Basil Fillan wrote: Hi, We've also experienced this after upgrading a few routers to Debian Buster. With a kernel bisect we found that a bug was introduced in the following commit: 3b6761d18bc11f2af2a6fc494e9026d39593f22c This bug was still present in master as of a few weeks ago. It appears entries are added to the IPv6 route cache which aren't visible from "ip -6 route show cache", but are causing the route cache garbage collection system to trigger extremely often (every packet?) once it exceeds the value of net.ipv6.route.max_size. Our original symptom was extreme forwarding jitter caused within the ip6_dst_gc function (identified by some spelunking with systemtap & perf) worsening as the size of the cache increased. This was due to our max_size sysctl inadvertently being set to 1 million. Reducing this value to the default 4096 broke IPv6 forwarding entirely on our test system under affected kernels. Our documentation had this sysctl marked as the maximum number of IPv6 routes, so it looks like the use changed at some point. We've rolled our routers back to kernel 4.9 (with the sysctl set to 4096) for now, which fixed our immediate issue. You can reproduce this by adding more than 4096 (default value of the sysctl) routes to the kernel and running "ip route get" for each of them. Once the route cache is filled, the error "RTNETLINK answers: Network is unreachable" will be received for each subsequent "ip route get" incantation, and v6 connectivity will be interrupted. Thanks, Basil On 26/02/2020 20:38, Clément Guivy wrote: Hi, did anyone find a solution or workaround regarding this issue? Considering a router use case. I have looked at rt6_stats, total route count is around 78k (full view), and around 4100 entries in the cache at the moment on my first router (forwarding a few Mb/s) and around 2500 entries on my second router (forwarding less than 1 Mb/s). I have reread the entire thread. At first, Alarig's research seemed to lead to a neighbor management problem, my understanding is that route cache is something else entirely - or is it related somehow? On 03/12/2019 19:29, Alarig Le Lay wrote: We agree then, and I act as a router on all those machines. Le 3 décembre 2019 19:27:11 GMT+01:00, Vincent Bernat a écrit : This is the result of PMTUd. But when you are a router, you don't need to do that, so it's mostly a problem for end hosts. On December 3, 2019 7:05:49 PM GMT+01:00, Alarig Le Lay wrote: On 03/12/2019 14:16, Vincent Bernat wrote: The information needs to be stored somewhere. Why has it to be stored? It’s not really my problem if someone else has a non-stantard MTU and can’t do TCP-MSS or PMTUd.
Re: IPv6 BGP & kernel 4.19
FYI, babeld seems to be affected by this same bug: https://github.com/jech/babeld/issues/50 The net.ipv6.route.max_size workaround is also mentioned there. Baptiste On 26-02-20, Basil Fillan wrote: > Hi, > > We've also experienced this after upgrading a few routers to Debian Buster. > With a kernel bisect we found that a bug was introduced in the following > commit: > > 3b6761d18bc11f2af2a6fc494e9026d39593f22c > > This bug was still present in master as of a few weeks ago. > > It appears entries are added to the IPv6 route cache which aren't visible > from "ip -6 route show cache", but are causing the route cache garbage > collection system to trigger extremely often (every packet?) once it exceeds > the value of net.ipv6.route.max_size. Our original symptom was extreme > forwarding jitter caused within the ip6_dst_gc function (identified by some > spelunking with systemtap & perf) worsening as the size of the cache > increased. This was due to our max_size sysctl inadvertently being set to 1 > million. Reducing this value to the default 4096 broke IPv6 forwarding > entirely on our test system under affected kernels. Our documentation had > this sysctl marked as the maximum number of IPv6 routes, so it looks like > the use changed at some point. > > We've rolled our routers back to kernel 4.9 (with the sysctl set to 4096) > for now, which fixed our immediate issue. > > You can reproduce this by adding more than 4096 (default value of the > sysctl) routes to the kernel and running "ip route get" for each of them. > Once the route cache is filled, the error "RTNETLINK answers: Network is > unreachable" will be received for each subsequent "ip route get" > incantation, and v6 connectivity will be interrupted. > > Thanks, > > Basil > > > On 26/02/2020 20:38, Clément Guivy wrote: > > Hi, did anyone find a solution or workaround regarding this issue? > > Considering a router use case. > > I have looked at rt6_stats, total route count is around 78k (full view), > > and around 4100 entries in the cache at the moment on my first router > > (forwarding a few Mb/s) and around 2500 entries on my second router > > (forwarding less than 1 Mb/s). > > I have reread the entire thread. At first, Alarig's research seemed to > > lead to a neighbor management problem, my understanding is that route > > cache is something else entirely - or is it related somehow? > > > > > > On 03/12/2019 19:29, Alarig Le Lay wrote: > > > We agree then, and I act as a router on all those machines. > > > > > > Le 3 décembre 2019 19:27:11 GMT+01:00, Vincent Bernat > > > a écrit : > > > > > > This is the result of PMTUd. But when you are a router, you don't > > > need to do that, so it's mostly a problem for end hosts. > > > > > > On December 3, 2019 7:05:49 PM GMT+01:00, Alarig Le Lay > > > wrote: > > > > > > On 03/12/2019 14:16, Vincent Bernat wrote: > > > > > > The information needs to be stored somewhere. > > > > > > > > > Why has it to be stored? It’s not really my problem if > > > someone else has > > > a non-stantard MTU and can’t do TCP-MSS or PMTUd. signature.asc Description: PGP signature
Re: IPv6 BGP & kernel 4.19
Hi, We've also experienced this after upgrading a few routers to Debian Buster. With a kernel bisect we found that a bug was introduced in the following commit: 3b6761d18bc11f2af2a6fc494e9026d39593f22c This bug was still present in master as of a few weeks ago. It appears entries are added to the IPv6 route cache which aren't visible from "ip -6 route show cache", but are causing the route cache garbage collection system to trigger extremely often (every packet?) once it exceeds the value of net.ipv6.route.max_size. Our original symptom was extreme forwarding jitter caused within the ip6_dst_gc function (identified by some spelunking with systemtap & perf) worsening as the size of the cache increased. This was due to our max_size sysctl inadvertently being set to 1 million. Reducing this value to the default 4096 broke IPv6 forwarding entirely on our test system under affected kernels. Our documentation had this sysctl marked as the maximum number of IPv6 routes, so it looks like the use changed at some point. We've rolled our routers back to kernel 4.9 (with the sysctl set to 4096) for now, which fixed our immediate issue. You can reproduce this by adding more than 4096 (default value of the sysctl) routes to the kernel and running "ip route get" for each of them. Once the route cache is filled, the error "RTNETLINK answers: Network is unreachable" will be received for each subsequent "ip route get" incantation, and v6 connectivity will be interrupted. Thanks, Basil On 26/02/2020 20:38, Clément Guivy wrote: Hi, did anyone find a solution or workaround regarding this issue? Considering a router use case. I have looked at rt6_stats, total route count is around 78k (full view), and around 4100 entries in the cache at the moment on my first router (forwarding a few Mb/s) and around 2500 entries on my second router (forwarding less than 1 Mb/s). I have reread the entire thread. At first, Alarig's research seemed to lead to a neighbor management problem, my understanding is that route cache is something else entirely - or is it related somehow? On 03/12/2019 19:29, Alarig Le Lay wrote: We agree then, and I act as a router on all those machines. Le 3 décembre 2019 19:27:11 GMT+01:00, Vincent Bernat a écrit : This is the result of PMTUd. But when you are a router, you don't need to do that, so it's mostly a problem for end hosts. On December 3, 2019 7:05:49 PM GMT+01:00, Alarig Le Lay wrote: On 03/12/2019 14:16, Vincent Bernat wrote: The information needs to be stored somewhere. Why has it to be stored? It’s not really my problem if someone else has a non-stantard MTU and can’t do TCP-MSS or PMTUd.
Re: IPv6 BGP & kernel 4.19
Hi, did anyone find a solution or workaround regarding this issue? Considering a router use case. I have looked at rt6_stats, total route count is around 78k (full view), and around 4100 entries in the cache at the moment on my first router (forwarding a few Mb/s) and around 2500 entries on my second router (forwarding less than 1 Mb/s). I have reread the entire thread. At first, Alarig's research seemed to lead to a neighbor management problem, my understanding is that route cache is something else entirely - or is it related somehow? On 03/12/2019 19:29, Alarig Le Lay wrote: We agree then, and I act as a router on all those machines. Le 3 décembre 2019 19:27:11 GMT+01:00, Vincent Bernat a écrit : This is the result of PMTUd. But when you are a router, you don't need to do that, so it's mostly a problem for end hosts. On December 3, 2019 7:05:49 PM GMT+01:00, Alarig Le Lay wrote: On 03/12/2019 14:16, Vincent Bernat wrote: The information needs to be stored somewhere. Why has it to be stored? It’s not really my problem if someone else has a non-stantard MTU and can’t do TCP-MSS or PMTUd.
Re: IPv6 BGP & kernel 4.19
We agree then, and I act as a router on all those machines. Le 3 décembre 2019 19:27:11 GMT+01:00, Vincent Bernat a écrit : >This is the result of PMTUd. But when you are a router, you don't need >to do that, so it's mostly a problem for end hosts. > >On December 3, 2019 7:05:49 PM GMT+01:00, Alarig Le Lay > wrote: >>On 03/12/2019 14:16, Vincent Bernat wrote: >>> The information needs to be stored somewhere. >> >>Why has it to be stored? It’s not really my problem if someone else >has >>a non-stantard MTU and can’t do TCP-MSS or PMTUd. >> >>-- >>Alarig > >-- >Sent from my Android device with K-9 Mail. Please excuse my brevity. -- Envoyé de mon appareil Android avec Courriel K-9 Mail. Veuillez excuser ma brièveté.
Re: IPv6 BGP & kernel 4.19
On 03/12/2019 14:16, Vincent Bernat wrote: > The information needs to be stored somewhere. Why has it to be stored? It’s not really my problem if someone else has a non-stantard MTU and can’t do TCP-MSS or PMTUd. -- Alarig
Re: IPv6 BGP & kernel 4.19
❦ 3 décembre 2019 12:48 +01, Alarig Le Lay : >> It's not unexpected. A cache entry is for a /128. > > When I’m routing 80k prefixes I don’t want to have n /128 routes because > someone doesn’t have 1500 of MTU. Is their a way to disable this > behaviour? I don't think there is. The information needs to be stored somewhere. With IPv6, they are materialized as regular route entries tagged as "cached routes". With IPv4, they are stored inside a route entry. -- Don't stop with your first draft. - The Elements of Programming Style (Kernighan & Plauger)
Re: IPv6 BGP & kernel 4.19
On 03/12/2019 11:58, Vincent Bernat wrote: > It's not unexpected. A cache entry is for a /128. When I’m routing 80k prefixes I don’t want to have n /128 routes because someone doesn’t have 1500 of MTU. Is their a way to disable this behaviour? -- Alarig
Re: IPv6 BGP & kernel 4.19
❦ 3 décembre 2019 11:46 +01, Alarig Le Lay : > So, I have more routes in cache than in FIB on my two core routers, I’m > pretty sure there is a bug there :p It's not unexpected. A cache entry is for a /128. > I have less routes in cache on 4.14 kernels but more traffic. > > I don’t know which function is feeding the cache, but I think that it’s > doing too much. The function is ip6_rt_cache_alloc(). It is being called on PMTU exceptions, on redirects and in this last case I currently fail to understand: > ipv6: Create RTF_CACHE clone when FLOWI_FLAG_KNOWN_NH is set > > This patch always creates RTF_CACHE clone with DST_NOCACHE > when FLOWI_FLAG_KNOWN_NH is set so that the rt6i_dst is set to > the fl6->daddr. -- It is a wise father that knows his own child. -- William Shakespeare, "The Merchant of Venice"
Re: IPv6 BGP & kernel 4.19
On mar. 3 déc. 09:40:31 2019, Vincent Bernat wrote: > So, there is 0x56 entries in the cache. Isn't that clear? :) > > https://elixir.bootlin.com/linux/latest/source/net/ipv6/route.c#L6006 I did a quick test on some routers: core01-arendal, no fullview, on my own ASN, no so much traffic, using tunnels https://pix.milkywan.fr/apWaD84h.png core01-arendal ~ # while :; do awk --non-decimal-data '{ print ("0x"$6)+0, "(" $6 ")" }' /proc/net/rt6_stats; sleep 120; done 86 (0056) 86 (0056) 86 (0056) core01-arendal ~ # ip -6 r | wc -l 64 core01-arendal ~ # uname -a Linux core01-arendal.no.swordarmor.fr 4.19.86-gentoo #1 SMP Mon Dec 2 19:02:33 CET 2019 x86_64 AMD GX-412TC SOC AuthenticAMD GNU/Linux core02-arendal, no fullview, on my own ASN, no so much traffic, using tunnels https://pix.milkywan.fr/NF3jNY9K.png core02-arendal ~ # while :; do awk --non-decimal-data '{ print ("0x"$6)+0, "(" $6 ")" }' /proc/net/rt6_stats; sleep 120; done 28 (001c) 30 (001e) 30 (001e) core02-arendal ~ # ip -6 r | wc -l 39 core02-arendal ~ # uname -a Linux core02-arendal.no.swordarmor.fr 4.19.86-gentoo #1 SMP Mon Dec 2 22:08:21 CET 2019 x86_64 AMD G-T40E Processor AuthenticAMD GNU/Linux edge01-terrahost, fullview, on my own ASN, no so much traffic, using one tunnel https://pix.milkywan.fr/6AVwYkY8.png edge01-terrahost ~ # while :; do awk --non-decimal-data '{ print ("0x"$6)+0, "(" $6 ")" }' /proc/net/rt6_stats; sleep 120; done 96 (0060) 101 (0065) 101 (0065) edge01-terrahost ~ # ip -6 r | wc -l 77439 edge01-terrahost ~ # uname -a Linux edge01-terrahost.no.swordarmor.fr 4.19.82-gentoo #2 SMP Tue Nov 12 22:08:28 CET 2019 x86_64 Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz GenuineIntel GNU/Linux edge02-fjordane, fullview, on my own ASN, no so much traffic, using one tunnel https://pix.milkywan.fr/J4rOuylq.png edge02-fjordane ~ # while :; do awk --non-decimal-data '{ print ("0x"$6)+0, "(" $6 ")" }' /proc/net/rt6_stats; sleep 120; done 110 (006e) 110 (006e) 110 (006e) edge02-fjordane ~ # ip -6 r | wc -l 77433 edge02-fjordane ~ # uname -a Linux edge02-fjordane.no.swordarmor.fr 4.19.86-gentoo #1 SMP Thu Nov 28 16:47:53 CET 2019 x86_64 Common KVM processor GenuineIntel GNU/Linux regis, fullview, on my own ASN, a bit more traffic, using one tunnel https://pix.milkywan.fr/5XeaK2du.png regis ~ # while :; do awk --non-decimal-data '{ print ("0x"$6)+0, "(" $6 ")" }' /proc/net/rt6_stats; sleep 120; done 0 () 1 (0001) 1 (0001) regis ~ # ip -6 r | wc -l 77538 regis ~ # uname -a Linux regis.swordarmor.fr 4.14.83-gentoo #2 SMP Sat Feb 2 16:50:41 CET 2019 x86_64 Intel(R) Xeon(R) CPU X3450 @ 2.67GHz GenuineIntel GNU/Linux asbr02, fullview, on a not-for-profit ASN providing services for others, 100M of traffic, using one tunnel https://pix.milkywan.fr/l1hfAAIn.png alarig@asbr02 ~ $ while :; do awk --non-decimal-data '{ print ("0x"$6)+0, "(" $6 ")" }' /proc/net/rt6_stats; sleep 120; done 4 (0004) 3 (0003) 0 () alarig@asbr02 ~ $ ip -6 r | wc -l 77525 alarig@asbr02 ~ $ uname -a Linux asbr02.cogent-rns.grifon.fr 4.14.156-gentoo #1 SMP Tue Dec 3 09:53:23 CET 2019 x86_64 Intel(R) Xeon(R) CPU X3450 @ 2.67GHz GenuineIntel GNU/Linux So, I have more routes in cache than in FIB on my two core routers, I’m pretty sure there is a bug there :p I have less routes in cache on 4.14 kernels but more traffic. I don’t know which function is feeding the cache, but I think that it’s doing too much. -- Alarig
Re: IPv6 BGP & kernel 4.19
❦ 3 décembre 2019 08:56 +01, Alarig Le Lay : >> Just to be clear: I did forget this fact and therefore my initial >> recommendation to increase max_size with more than 4096 active hosts >> does not apply anymore (as long as you have a 4.2+ kernel). Keep the >> default value and watch `/proc/net/rt6_stats`. > > core01-arendal ~ # cat /proc/net/rt6_stats > 0048 002c 5e56 0050 0056 0020 > > It is supposed to be understandable? :D So, there is 0x56 entries in the cache. Isn't that clear? :) https://elixir.bootlin.com/linux/latest/source/net/ipv6/route.c#L6006 -- Modularise. Use subroutines. - The Elements of Programming Style (Kernighan & Plauger)
Re: IPv6 BGP & kernel 4.19
On 02/12/2019 23:04, Vincent Bernat wrote: > Just to be clear: I did forget this fact and therefore my initial > recommendation to increase max_size with more than 4096 active hosts > does not apply anymore (as long as you have a 4.2+ kernel). Keep the > default value and watch `/proc/net/rt6_stats`. core01-arendal ~ # cat /proc/net/rt6_stats 0048 002c 5e56 0050 0056 0020 It is supposed to be understandable? :D -- Alarig
Re: IPv6 BGP & kernel 4.19
❦ 2 décembre 2019 22:48 +01, Vincent Bernat : > Also, from 4.2, the cache entries are only created for exceptions (PMTU > notably). So, in fact, the initial value should be mostly safe. You can > monitor it with `/proc/net/rt6_stats`. This is the before last value. If > you can share what you have, I would be curious to know how low it is > (compared to the 4th entry notably). Just to be clear: I did forget this fact and therefore my initial recommendation to increase max_size with more than 4096 active hosts does not apply anymore (as long as you have a 4.2+ kernel). Keep the default value and watch `/proc/net/rt6_stats`. -- Program defensively. - The Elements of Programming Style (Kernighan & Plauger)
Re: IPv6 BGP & kernel 4.19
❦ 2 décembre 2019 21:58 +01, Alarig Le Lay : >> For IPv6, this is the size of the routing cache. If you have more than >> 4096 active hosts, Linux will aggressively try to run garbage >> collection, eating CPU. In this case, increase both >> net.ipv6.route.max_size and net.ipv6.route.gc_thresh. > > Do you know what are the risks when we raise those parameters? A bit > more RAM consumption? You are mostly safe with RAM. Increasing the value to 512k would eat 256MB of RAM. However, if an attacker is still able to overflow the cache, it is costly in term of CPU. This is a bit similar to the route cache for IPv4, so you need to play with threshold, interval and timeout to try to keep CPU usage down, but ultimately, a fast enough attacker can do a lot of damage here. I don't have real-life experience with this aspect. Also, from 4.2, the cache entries are only created for exceptions (PMTU notably). So, in fact, the initial value should be mostly safe. You can monitor it with `/proc/net/rt6_stats`. This is the before last value. If you can share what you have, I would be curious to know how low it is (compared to the 4th entry notably). -- Writing is turning one's worst moments into money. -- J.P. Donleavy
Re: IPv6 BGP & kernel 4.19
Hi Vincent, On lun. 2 déc. 21:38:21 2019, Vincent Bernat wrote: > For IPv6, this is the size of the routing cache. If you have more than > 4096 active hosts, Linux will aggressively try to run garbage > collection, eating CPU. In this case, increase both > net.ipv6.route.max_size and net.ipv6.route.gc_thresh. Do you know what are the risks when we raise those parameters? A bit more RAM consumption? Regards, -- Alarig
Re: IPv6 BGP & kernel 4.19
❦ 1 décembre 2019 19:20 +01, Clément Guivy : > Hi, that's good news. One thing that still confuses me though is that > the default values for these settings are the same in Debian 9 (4.9 > kernel) and Debian 10 (4.19 kernel), so I would expect the behaviour > to be the same between both versions in that regard. > Also I'm not sure to understand what this max_size parameter actually > does since I have it to default value (4096), and yet ipv6 route table > at the moment is >70k entries large without the kernel complaining. For IPv4, the parameter is ignored since Linux 3.6. For IPv6, this is the size of the routing cache. If you have more than 4096 active hosts, Linux will aggressively try to run garbage collection, eating CPU. In this case, increase both net.ipv6.route.max_size and net.ipv6.route.gc_thresh. That's a pity, but this value is not easily observable, so it's hard to know when you hit it. Also, while IPv4 recently got the ability back to enumerate the cache, this is not the case for IPv6. This setting is a bit confusing as it is not documented and in the past, it was limiting the whole IPv6 route table (before Linux 3.0). -- Write clearly - don't sacrifice clarity for "efficiency". - The Elements of Programming Style (Kernighan & Plauger)
Re: IPv6 BGP & kernel 4.19
On 01/12/2019 18:20, Clément Guivy wrote: > On 01/12/2019 13:43, Frederik Kriewitz wrote: >> This is our current suspicion too. neighbours and routes are well >> below 4096 in our case. We also had to adjust >> net.ipv6.neigh.default.gc_thresh1/2/3. Since the adjustment it's been >> working fine. >> > > Hi, that's good news. One thing that still confuses me though is that > the default values for these settings are the same in Debian 9 (4.9 > kernel) and Debian 10 (4.19 kernel), so I would expect the behaviour to > be the same between both versions in that regard. > Also I'm not sure to understand what this max_size parameter actually > does since I have it to default value (4096), and yet ipv6 route table > at the moment is >70k entries large without the kernel complaining. To add our info - We're using Intel 82599ES NICs. We have full table on v4 and v6, and about 20 neighbors on each. Our route/max_size for v4 and and v6 are defaults (2M and 4096 respectively) - and as noted, these values are the same in our Stretch and Buster boxes. Andrew
Re: IPv6 BGP & kernel 4.19
On 01/12/2019 13:43, Frederik Kriewitz wrote: This is our current suspicion too. neighbours and routes are well below 4096 in our case. We also had to adjust net.ipv6.neigh.default.gc_thresh1/2/3. Since the adjustment it's been working fine. Hi, that's good news. One thing that still confuses me though is that the default values for these settings are the same in Debian 9 (4.9 kernel) and Debian 10 (4.19 kernel), so I would expect the behaviour to be the same between both versions in that regard. Also I'm not sure to understand what this max_size parameter actually does since I have it to default value (4096), and yet ipv6 route table at the moment is >70k entries large without the kernel complaining.
Re: IPv6 BGP & kernel 4.19
On Sun, Dec 1, 2019 at 12:57 PM Daniel Suchy wrote: > One idea that comes in my mind is default kernel limit for IPv6 routes > in memory (sysctl net.ipv6.route.max_size); and such default is quite > low for fullbgp/DFZ IPv6 deployments and it's still set to 4096 on > Debian/Buster with stock kernels. Can people having issues with 4.19 > kernels check sysctl mentioned above? This is our current suspicion too. neighbours and routes are well below 4096 in our case. We also had to adjust net.ipv6.neigh.default.gc_thresh1/2/3. Since the adjustment it's been working fine.
Re: IPv6 BGP & kernel 4.19
Hello, I'm running bird 1.6.x branch (packages from Debian/Buster; currently 1.6.6) on recent 4.19 custom-build kernels without any issues (on armhf hardware). My BGP sessions are carrying only few routes (default + some more specifics). One idea that comes in my mind is default kernel limit for IPv6 routes in memory (sysctl net.ipv6.route.max_size); and such default is quite low for fullbgp/DFZ IPv6 deployments and it's still set to 4096 on Debian/Buster with stock kernels. Can people having issues with 4.19 kernels check sysctl mentioned above? - Daniel On 11/21/19 6:12 PM, Ondrej Zajicek wrote: > On Thu, Nov 21, 2019 at 04:09:24PM +, Andrew Hearn wrote: >>> Without traffic through the box (all IPv6 prefixes filtered) the bgp >>> sessions is stable. With traffic the bgp session dies after some time >>> and ssh connections in the default table freezes. >>> >>> I did some packet captures and saw tcp retransmissions before hold timer >>> expires. >>> >>> Kernel 4.14.127 is here stable, too. Sadly I have no time for a kernel >>> bisect until September. (And no glue where to start and how to trigger >>> the bug faster.) >> >> Sorry to bring up a fairly old thread... >> >> We believe we are seeing this problem too, since a Stretch->Buster >> upgrade - was there a solution to this? > > Perhaps try kernel 5.2.x or 5.3.x from buster-backports? >
Re: IPv6 BGP & kernel 4.19
Hi Frederik, On 30.11.19 23:31, Frederik Kriewitz wrote: > On Sat, Nov 30, 2019 at 12:26 PM Benedikt Neuffer > wrote: > Which NICs are you using? We are using Intel X520. Regards, Benedikt -- Karlsruher Institut für Technologie (KIT) Steinbuch Centre for Computing (SCC) Benedikt Neuffer Netze und Telekommunikation (NET) Hermann-von-Helmholtz-Platz 1 Gebäude 442 Raum 185 76344 Eggenstein-Leopoldshafen Telefon: +49 721 608-24502 Fax: +49 721 608-47763 E-Mail: benedikt.neuf...@kit.edu Web: https://www.scc.kit.edu Sitz der Körperschaft: Kaiserstraße 12, 76131 Karlsruhe KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft Signaturversion: 19.1.0 beta smime.p7s Description: S/MIME Cryptographic Signature
Re: IPv6 BGP & kernel 4.19
On sam. 30 nov. 23:50:48 2019, Alarig Le Lay wrote: > We are using “Intel Corporation 82576 Gigabit Network Connection” NICs. And “Broadcom Limited NetXtreme II BCM5709 Gigabit Ethernet”, sorry I forgot this box. -- Alarig
Re: IPv6 BGP & kernel 4.19
On sam. 30 nov. 23:31:39 2019, Frederik Kriewitz wrote: > We don't know if this might be NIC related yet. We're seeing it happen > with Intel X710 NICs (With all offloading features disabled). Which > NICs are you using? We are using “Intel Corporation 82576 Gigabit Network Connection” NICs. -- Alarig
Re: IPv6 BGP & kernel 4.19
On Sat, Nov 30, 2019 at 12:26 PM Benedikt Neuffer wrote: > as far as I see one need some traffic to reproduce the issue. Without > traffic I haven't seen the issue. Yes, we saw this behaviour too using the buster kernel. It seems to be traffic and/or neighbours related. Forwarding itself seems to work but neighbour discovery stops working (that's why multicast based OSPF sessions are not affected). In this state the kernel doesn't generate any neighbor solicitation packets (not visible using tcpdump). Once the neighbour cache times out IPv6 connectivity is broken. We don't know if this might be NIC related yet. We're seeing it happen with Intel X710 NICs (With all offloading features disabled). Which NICs are you using? Resetting the NIC using ethtool -r $INTERFACE seems to have fixed it once for us. The problem fixes itself after ~ 90 to 110 minutes too until it appears again.
Re: IPv6 BGP & kernel 4.19
I saw it in production with ~20 VMs, but I don’t know how much is needed to trigger it. On sam. 30 nov. 11:43:29 2019, Stefan Jakob wrote: > Can anyone provide test configs? > > Is it testable inside two or three VMs? > > Could offer 5.3.X tests here. > > On Sat, Nov 23, 2019 at 6:48 PM Alarig Le Lay wrote: > > > > On jeu. 21 nov. 18:12:17 2019, Ondrej Zajicek wrote: > > > Perhaps try kernel 5.2.x or 5.3.x from buster-backports? > > > > I’m very interested by test results from newer kernels than 5.0.x > > > > -- > > Alarig
Re: IPv6 BGP & kernel 4.19
Hi all, On 30.11.19 11:43, Stefan Jakob wrote: > Can anyone provide test configs? > > Is it testable inside two or three VMs? > > Could offer 5.3.X tests here. > > On Sat, Nov 23, 2019 at 6:48 PM Alarig Le Lay wrote: >> >> On jeu. 21 nov. 18:12:17 2019, Ondrej Zajicek wrote: >>> Perhaps try kernel 5.2.x or 5.3.x from buster-backports? >> >> I’m very interested by test results from newer kernels than 5.0.x >> >> -- >> Alarig > as far as I see one need some traffic to reproduce the issue. Without traffic I haven't seen the issue. Regards, Benedikt -- Karlsruher Institut für Technologie (KIT) Steinbuch Centre for Computing (SCC) Benedikt Neuffer Netze und Telekommunikation (NET) Hermann-von-Helmholtz-Platz 1 Gebäude 442 Raum 185 76344 Eggenstein-Leopoldshafen Telefon: +49 721 608-24502 Fax: +49 721 608-47763 E-Mail: benedikt.neuf...@kit.edu Web: https://www.scc.kit.edu Sitz der Körperschaft: Kaiserstraße 12, 76131 Karlsruhe KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft Signaturversion: 19.1.0 beta smime.p7s Description: S/MIME Cryptographic Signature
Re: IPv6 BGP & kernel 4.19
Can anyone provide test configs? Is it testable inside two or three VMs? Could offer 5.3.X tests here. On Sat, Nov 23, 2019 at 6:48 PM Alarig Le Lay wrote: > > On jeu. 21 nov. 18:12:17 2019, Ondrej Zajicek wrote: > > Perhaps try kernel 5.2.x or 5.3.x from buster-backports? > > I’m very interested by test results from newer kernels than 5.0.x > > -- > Alarig
Re: IPv6 BGP & kernel 4.19
On jeu. 21 nov. 18:12:17 2019, Ondrej Zajicek wrote: > Perhaps try kernel 5.2.x or 5.3.x from buster-backports? I’m very interested by test results from newer kernels than 5.0.x -- Alarig
Re: IPv6 BGP & kernel 4.19
On Thu, Nov 21, 2019 at 04:09:24PM +, Andrew Hearn wrote: > > Without traffic through the box (all IPv6 prefixes filtered) the bgp > > sessions is stable. With traffic the bgp session dies after some time > > and ssh connections in the default table freezes. > > > > I did some packet captures and saw tcp retransmissions before hold timer > > expires. > > > > Kernel 4.14.127 is here stable, too. Sadly I have no time for a kernel > > bisect until September. (And no glue where to start and how to trigger > > the bug faster.) > > Sorry to bring up a fairly old thread... > > We believe we are seeing this problem too, since a Stretch->Buster > upgrade - was there a solution to this? Perhaps try kernel 5.2.x or 5.3.x from buster-backports? -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santi...@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Re: IPv6 BGP & kernel 4.19
Hi, On 21/11/2019 17:46, Benedikt Neuffer wrote: > Hi Andrew, > > On 21.11.19 17:09, Andrew Hearn wrote: >> Sorry to bring up a fairly old thread... >> >> We believe we are seeing this problem too, since a Stretch->Buster >> upgrade - was there a solution to this? >> >> Thanks > > The problem still exists. We are still running on kernel 4.14.x. I had > no time to do any further debugging. > > Regards, > Benedikt > > I also had the problem with 5.x on proxmox 6. But I didn’t begin my debugging either, E_NOTIME… -- Alarig
Re: IPv6 BGP & kernel 4.19
Hi Andrew, On 21.11.19 17:09, Andrew Hearn wrote: > Sorry to bring up a fairly old thread... > > We believe we are seeing this problem too, since a Stretch->Buster > upgrade - was there a solution to this? > > Thanks The problem still exists. We are still running on kernel 4.14.x. I had no time to do any further debugging. Regards, Benedikt -- Karlsruher Institut für Technologie (KIT) Steinbuch Centre for Computing (SCC) Benedikt Neuffer Netze und Telekommunikation (NET) Hermann-von-Helmholtz-Platz 1 Gebäude 442 Raum 185 76344 Eggenstein-Leopoldshafen Telefon: +49 721 608-24502 Fax: +49 721 608-47763 E-Mail: benedikt.neuf...@kit.edu Web: https://www.scc.kit.edu Sitz der Körperschaft: Kaiserstraße 12, 76131 Karlsruhe KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft Signaturversion: 19.1.0 beta smime.p7s Description: S/MIME Cryptographic Signature
Re: IPv6 BGP & kernel 4.19
On 20/06/2019 17:13, Benedikt Neuffer wrote: > Hi, > > On 19.06.19 20:09, Alarig Le Lay wrote: >> Hi, >> >> On mer. 19 juin 09:10:53 2019, Robert Sander wrote: >>> Hi, >>> >>> our routers run on Debian stretch with bird 1.6.4 from >>> bird.network.cz/debian. >>> >>> Yesterday I tried kernel 4.19 from backports.debian.org and ran into a >>> weird issue with IPv6 BGP sessions: >>> >>> All Peerings reported "Error: Hold timer expired" ca. every 40 minutes. >>> >>> IPv6 forwarding was flapping all the time. >>> >>> After rebooting into kernel 4.9 everything worked again. >>> >>> IPv4 BGP was not affected and also OSPF (v4 and v6). I could disable all >>> IPv6 BGP peerings on this router and then it forwarded to another router >>> learned via OSPF for IPv6 without issues. >>> >>> Has anyone seen such a behaviour? >> >> I’ve seen this with 4.19 on gentoo. For now I’m still running 4.14. >> https://archives.gentoo.org/gentoo-user/message/fab628cc53e4a55589410f9dff6abd23 >> > > Same here. Gentoo, Linux 4.19.52, Bird 2.0.4. I am running a full table > using a separate VRF and the default table as management VRF. > > Without traffic through the box (all IPv6 prefixes filtered) the bgp > sessions is stable. With traffic the bgp session dies after some time > and ssh connections in the default table freezes. > > I did some packet captures and saw tcp retransmissions before hold timer > expires. > > Kernel 4.14.127 is here stable, too. Sadly I have no time for a kernel > bisect until September. (And no glue where to start and how to trigger > the bug faster.) Sorry to bring up a fairly old thread... We believe we are seeing this problem too, since a Stretch->Buster upgrade - was there a solution to this? Thanks Andrew.
Re: IPv6 BGP & kernel 4.19
Hi, On 19.06.19 20:09, Alarig Le Lay wrote: > Hi, > > On mer. 19 juin 09:10:53 2019, Robert Sander wrote: >> Hi, >> >> our routers run on Debian stretch with bird 1.6.4 from >> bird.network.cz/debian. >> >> Yesterday I tried kernel 4.19 from backports.debian.org and ran into a >> weird issue with IPv6 BGP sessions: >> >> All Peerings reported "Error: Hold timer expired" ca. every 40 minutes. >> >> IPv6 forwarding was flapping all the time. >> >> After rebooting into kernel 4.9 everything worked again. >> >> IPv4 BGP was not affected and also OSPF (v4 and v6). I could disable all >> IPv6 BGP peerings on this router and then it forwarded to another router >> learned via OSPF for IPv6 without issues. >> >> Has anyone seen such a behaviour? > > I’ve seen this with 4.19 on gentoo. For now I’m still running 4.14. > https://archives.gentoo.org/gentoo-user/message/fab628cc53e4a55589410f9dff6abd23 > Same here. Gentoo, Linux 4.19.52, Bird 2.0.4. I am running a full table using a separate VRF and the default table as management VRF. Without traffic through the box (all IPv6 prefixes filtered) the bgp sessions is stable. With traffic the bgp session dies after some time and ssh connections in the default table freezes. I did some packet captures and saw tcp retransmissions before hold timer expires. Kernel 4.14.127 is here stable, too. Sadly I have no time for a kernel bisect until September. (And no glue where to start and how to trigger the bug faster.) Regards Bene -- Karlsruher Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) Benedikt Neuffer Netze und Telekommunikation (NET) Hermann-von-Helmholtz-Platz 1 Gebäude 442 Raum 185 76344 Eggenstein-Leopoldshafen Telefon: +49 721 608-24502 Fax: +49 721 608-47763 E-Mail: benedikt.neuf...@kit.edu Web: https://www.scc.kit.edu Sitz der Körperschaft: Kaiserstraße 12, 76131 Karlsruhe KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft Signaturversion: 19.1.0 beta smime.p7s Description: S/MIME Cryptographic Signature
Re: IPv6 BGP & kernel 4.19
Hi, On mer. 19 juin 09:10:53 2019, Robert Sander wrote: > Hi, > > our routers run on Debian stretch with bird 1.6.4 from > bird.network.cz/debian. > > Yesterday I tried kernel 4.19 from backports.debian.org and ran into a > weird issue with IPv6 BGP sessions: > > All Peerings reported "Error: Hold timer expired" ca. every 40 minutes. > > IPv6 forwarding was flapping all the time. > > After rebooting into kernel 4.9 everything worked again. > > IPv4 BGP was not affected and also OSPF (v4 and v6). I could disable all > IPv6 BGP peerings on this router and then it forwarded to another router > learned via OSPF for IPv6 without issues. > > Has anyone seen such a behaviour? I’ve seen this with 4.19 on gentoo. For now I’m still running 4.14. https://archives.gentoo.org/gentoo-user/message/fab628cc53e4a55589410f9dff6abd23 -- Alarig