Re: IPv6 BGP & kernel 4.19 (and upto 5.10.46)

2021-08-25 Thread Nico Schottelius


Hey Oliver,

Oliver  writes:
> [...]
> Why is still the default value of net.ipv6.route.max_size still 4096?
> Compared to IPv4 value:
> net.ipv4.route.max_size = 2147483647
>
> Has someone done more research on this topic?

I believe this is a question that should be asked on the LKML or
linux-net mailing list - it's very valid and I'd be in favor for
aligning it with the IPv4 value.

Cheers,

Nico

--
Sustainable and modern Infrastructures by ungleich.ch


Re: IPv6 BGP & kernel 4.19 (and upto 5.10.46)

2021-08-25 Thread Oliver
Hello,

back again on this topic. This problem is still not completely fixed with 
Debian Bullseye
and kernel 5.10.46.

The workaround is still:
net.ipv6.route.max_size = 40
net.ipv6.route.gc_thresh = 102400

On https://bird.network.cz/pipermail/bird-users/2020-March/014406.html is
mentioned that you can also set:
net.ipv6.route.gc_thresh = -1

But is this value save to use? 

This is also the default for IPv4:
net.ipv4.route.gc_thresh = -1

With the default value of net.ipv6.route.gc_thresh = 1024 we have still much
jitter on the line.

Why is still the default value of net.ipv6.route.max_size still 4096?
Compared to IPv4 value:
net.ipv4.route.max_size = 2147483647

Has someone done more research on this topic?

Best regards,

Oliver


smime.p7s
Description: S/MIME cryptographic signature


Re: IPv6 BGP & kernel 4.19

2020-09-24 Thread Oliver
On Thu, 24 Sep 2020, Clément Guivy wrote:

> On 24/09/2020 14:37, Oliver wrote:
> > Hello,
> > 
> > after upgrading to debian buster with kernel 4.19 we also had problems.
> 
> How filled is your route cache compared to the sysctl treshold? See the
> (hex) value with :
> cut -d'' -f 6 /proc/net/rt6_stats
awk '{ print ("0x"$6)+0, "(" $6 ")" }' /proc/net/rt6_stats
It is between 1315 (0523) and 1738 (06ca)

> Do you get a "Network is unreachable" error at some point if you do an "ip
> route get" on each prefix of your (ipv6) routing table? (while you are doing
> this test you should see the cache being filled according to the rt6_stats
> file as said before)
After we set "net.ipv6.route.max_size = 40" we do not get any "Network is
unreachable" anymore. 
This is how we tested it:
ip -6 route |egrep "^[0-9a-f]{1,4}:"|awk '{ print $1; }'|sed "s#/.*##"|xargs -L 
1 ip -6 route get 1> /dev/null

> How filled is your neighbor table compared to the sysctl treshold? You can
> read it with :
> ip -6 neigh sh | wc -l
18 (so very low)

> Do you notice random drops on Bird sessions?
After we set "net.ipv6.route.max_size = 40" we do not have any drops
anymore.

We have many ipv6_routes:
cat /proc/net/ipv6_route | wc -l
207281 (so more then the normal full IPv6 BGP table which is around 9)

At the moment just the "jitter" is the problem we have.

I just increased net.ipv6.route.gc_thresh to 102400 as suggested from micah.
(1/4 of ipv6.route.max_size)

With the increased value /proc/net/rt6_stats is going up to 2186 (088a) and
stayes in that region.

So for the past minutes with this config everything runs smoothly:
net.ipv6.route.max_size = 40
net.ipv6.route.gc_thresh = 102400

I did not changed the net.ipv6.neigh.default.gc_thresh* values.

I will monitor the values and write again after some time.

Oliver



Re: IPv6 BGP & kernel 4.19

2020-09-24 Thread Clément Guivy

On 24/09/2020 14:37, Oliver wrote:

Hello,

after upgrading to debian buster with kernel 4.19 we also had problems.


How filled is your route cache compared to the sysctl treshold? See the 
(hex) value with :

cut -d'' -f 6 /proc/net/rt6_stats

Do you get a "Network is unreachable" error at some point if you do an 
"ip route get" on each prefix of your (ipv6) routing table? (while you 
are doing this test you should see the cache being filled according to 
the rt6_stats file as said before)


How filled is your neighbor table compared to the sysctl treshold? You 
can read it with :

ip -6 neigh sh | wc -l

Do you notice random drops on Bird sessions?


Re: IPv6 BGP & kernel 4.19

2020-09-24 Thread micah anderson
Oliver  writes:

> Hello,
>
> after upgrading to debian buster with kernel 4.19 we also had problems.
>
> By adjusting net.ipv6.route.max_size we have fixed the following messages:
> watchdog: BUG: soft lockup - CPU#X stuck for 22s! 
> and
> ixgbe :02:00.0 ens2fX: initiating reset due to tx timeout
>
> But we still had a lot of jitter on the line. Downgrading to 4.9.0 fixed the
> problem, but this is not a permanent solution.
>
> What else did we tried:
> * Increasing gc_threshX
> net.ipv6.neigh.default.gc_thresh1 = 2048
> net.ipv6.neigh.default.gc_thresh2 = 4096
> net.ipv6.neigh.default.gc_thresh3 = 8192
> => Did not help

The linux kernel is getting rid of ipv6 caching, like it did with ipv4,
but it will take some time to get there. It seems that in this kernel
they have set a small value for net.ipv6.route.max_size (4096!), and
when this parameter is increased (e.g. 1048576) the problem went
away for us.

I'm not 100% clear on what units this value is, I had around 89k ipv6
routes, so this value is definitely higher. I'm sure that setting t too
high could result in some memory issues.

Additionally, you also want to raise net.ipv6.route.gc_thresh to avoid
running the garbage collector too often. I found that the rule of thumb
here is 1/4 the size of ipv6.route.max_size.

I did find that in Linux kernel 5.2 there is a message output to the
kernel ring buffer when the ipv6.route.max_size is hit, so you at least
have a *clue* what is going on. In 4.19, which is what Debian Buster is,
you don't get that clue.

-- 
micah


Re: IPv6 BGP & kernel 4.19

2020-09-24 Thread Oliver
Hello,

after upgrading to debian buster with kernel 4.19 we also had problems.

By adjusting net.ipv6.route.max_size we have fixed the following messages:
watchdog: BUG: soft lockup - CPU#X stuck for 22s! 
and
ixgbe :02:00.0 ens2fX: initiating reset due to tx timeout

But we still had a lot of jitter on the line. Downgrading to 4.9.0 fixed the
problem, but this is not a permanent solution.

What else did we tried:
* Increasing gc_threshX
net.ipv6.neigh.default.gc_thresh1 = 2048
net.ipv6.neigh.default.gc_thresh2 = 4096
net.ipv6.neigh.default.gc_thresh3 = 8192
=> Did not help

* Going to a backports kernel (5.7.0)
=> Did not help

@Frederik Kriewitz: What did you do fix that problem?

Oliver


Re: IPv6 BGP & kernel 4.19

2020-03-16 Thread Clément Guivy

Thanks.

I found a solution which seems to be working so far, with regular Debian 
4.19 kernel, on my 2 edge routers.


I set both net.ipv6.gc_thresh and max_size to 131072, the reasoning 
behind that is to have this limit above the number of routes in the full 
view, so that gc is not triggered too often.
Once the full view is loaded I can now perform an 'ip route get' lookup 
on each and every prefix without getting a "Network is unreachable" 
error (thanks for the tip Basil), nor face a noticeable service 
disruption, and IPv6 BGP sessions have also been stable so far (ie for a 
few days).


If anyone reproduces this solution (or found another one) I would be 
glad to know.



On 16/03/2020 12:41, Baptiste Jonglez wrote:

FYI, babeld seems to be affected by this same bug: 
https://github.com/jech/babeld/issues/50

The net.ipv6.route.max_size workaround is also mentioned there.

Baptiste

On 26-02-20, Basil Fillan wrote:

Hi,

We've also experienced this after upgrading a few routers to Debian Buster.
With a kernel bisect we found that a bug was introduced in the following
commit:

3b6761d18bc11f2af2a6fc494e9026d39593f22c

This bug was still present in master as of a few weeks ago.

It appears entries are added to the IPv6 route cache which aren't visible
from "ip -6 route show cache", but are causing the route cache garbage
collection system to trigger extremely often (every packet?) once it exceeds
the value of net.ipv6.route.max_size. Our original symptom was extreme
forwarding jitter caused within the ip6_dst_gc function (identified by some
spelunking with systemtap & perf) worsening as the size of the cache
increased. This was due to our max_size sysctl inadvertently being set to 1
million. Reducing this value to the default 4096 broke IPv6 forwarding
entirely on our test system under affected kernels. Our documentation had
this sysctl marked as the maximum number of IPv6 routes, so it looks like
the use changed at some point.

We've rolled our routers back to kernel 4.9 (with the sysctl set to 4096)
for now, which fixed our immediate issue.

You can reproduce this by adding more than 4096 (default value of the
sysctl) routes to the kernel and running "ip route get" for each of them.
Once the route cache is filled, the error "RTNETLINK answers: Network is
unreachable" will be received for each subsequent "ip route get"
incantation, and v6 connectivity will be interrupted.

Thanks,

Basil


On 26/02/2020 20:38, Clément Guivy wrote:

Hi, did anyone find a solution or workaround regarding this issue?
Considering a router use case.
I have looked at rt6_stats, total route count is around 78k (full view),
and around 4100 entries in the cache at the moment on my first router
(forwarding a few Mb/s) and around 2500 entries on my second router
(forwarding less than 1 Mb/s).
I have reread the entire thread. At first, Alarig's research seemed to
lead to a neighbor management problem, my understanding is that route
cache is something else entirely - or is it related somehow?


On 03/12/2019 19:29, Alarig Le Lay wrote:

We agree then, and I act as a router on all those machines.

Le 3 décembre 2019 19:27:11 GMT+01:00, Vincent Bernat
 a écrit :

     This is the result of PMTUd. But when you are a router, you don't
     need to do that, so it's mostly a problem for end hosts.

     On December 3, 2019 7:05:49 PM GMT+01:00, Alarig Le Lay
      wrote:

     On 03/12/2019 14:16, Vincent Bernat wrote:

     The information needs to be stored somewhere.


     Why has it to be stored? It’s not really my problem if
someone else has
     a non-stantard MTU and can’t do TCP-MSS or PMTUd.




Re: IPv6 BGP & kernel 4.19

2020-03-16 Thread Baptiste Jonglez
FYI, babeld seems to be affected by this same bug: 
https://github.com/jech/babeld/issues/50

The net.ipv6.route.max_size workaround is also mentioned there.

Baptiste

On 26-02-20, Basil Fillan wrote:
> Hi,
> 
> We've also experienced this after upgrading a few routers to Debian Buster.
> With a kernel bisect we found that a bug was introduced in the following
> commit:
> 
> 3b6761d18bc11f2af2a6fc494e9026d39593f22c
> 
> This bug was still present in master as of a few weeks ago.
> 
> It appears entries are added to the IPv6 route cache which aren't visible
> from "ip -6 route show cache", but are causing the route cache garbage
> collection system to trigger extremely often (every packet?) once it exceeds
> the value of net.ipv6.route.max_size. Our original symptom was extreme
> forwarding jitter caused within the ip6_dst_gc function (identified by some
> spelunking with systemtap & perf) worsening as the size of the cache
> increased. This was due to our max_size sysctl inadvertently being set to 1
> million. Reducing this value to the default 4096 broke IPv6 forwarding
> entirely on our test system under affected kernels. Our documentation had
> this sysctl marked as the maximum number of IPv6 routes, so it looks like
> the use changed at some point.
> 
> We've rolled our routers back to kernel 4.9 (with the sysctl set to 4096)
> for now, which fixed our immediate issue.
> 
> You can reproduce this by adding more than 4096 (default value of the
> sysctl) routes to the kernel and running "ip route get" for each of them.
> Once the route cache is filled, the error "RTNETLINK answers: Network is
> unreachable" will be received for each subsequent "ip route get"
> incantation, and v6 connectivity will be interrupted.
> 
> Thanks,
> 
> Basil
> 
> 
> On 26/02/2020 20:38, Clément Guivy wrote:
> > Hi, did anyone find a solution or workaround regarding this issue?
> > Considering a router use case.
> > I have looked at rt6_stats, total route count is around 78k (full view),
> > and around 4100 entries in the cache at the moment on my first router
> > (forwarding a few Mb/s) and around 2500 entries on my second router
> > (forwarding less than 1 Mb/s).
> > I have reread the entire thread. At first, Alarig's research seemed to
> > lead to a neighbor management problem, my understanding is that route
> > cache is something else entirely - or is it related somehow?
> > 
> > 
> > On 03/12/2019 19:29, Alarig Le Lay wrote:
> > > We agree then, and I act as a router on all those machines.
> > > 
> > > Le 3 décembre 2019 19:27:11 GMT+01:00, Vincent Bernat
> > >  a écrit :
> > > 
> > >     This is the result of PMTUd. But when you are a router, you don't
> > >     need to do that, so it's mostly a problem for end hosts.
> > > 
> > >     On December 3, 2019 7:05:49 PM GMT+01:00, Alarig Le Lay
> > >      wrote:
> > > 
> > >     On 03/12/2019 14:16, Vincent Bernat wrote:
> > > 
> > >     The information needs to be stored somewhere.
> > > 
> > > 
> > >     Why has it to be stored? It’s not really my problem if
> > > someone else has
> > >     a non-stantard MTU and can’t do TCP-MSS or PMTUd.


signature.asc
Description: PGP signature


Re: IPv6 BGP & kernel 4.19

2020-02-26 Thread Basil Fillan

Hi,

We've also experienced this after upgrading a few routers to Debian 
Buster. With a kernel bisect we found that a bug was introduced in the 
following commit:


3b6761d18bc11f2af2a6fc494e9026d39593f22c

This bug was still present in master as of a few weeks ago.

It appears entries are added to the IPv6 route cache which aren't 
visible from "ip -6 route show cache", but are causing the route cache 
garbage collection system to trigger extremely often (every packet?) 
once it exceeds the value of net.ipv6.route.max_size. Our original 
symptom was extreme forwarding jitter caused within the ip6_dst_gc 
function (identified by some spelunking with systemtap & perf) worsening 
as the size of the cache increased. This was due to our max_size sysctl 
inadvertently being set to 1 million. Reducing this value to the default 
4096 broke IPv6 forwarding entirely on our test system under affected 
kernels. Our documentation had this sysctl marked as the maximum number 
of IPv6 routes, so it looks like the use changed at some point.


We've rolled our routers back to kernel 4.9 (with the sysctl set to 
4096) for now, which fixed our immediate issue.


You can reproduce this by adding more than 4096 (default value of the 
sysctl) routes to the kernel and running "ip route get" for each of 
them. Once the route cache is filled, the error "RTNETLINK answers: 
Network is unreachable" will be received for each subsequent "ip route 
get" incantation, and v6 connectivity will be interrupted.


Thanks,

Basil


On 26/02/2020 20:38, Clément Guivy wrote:
Hi, did anyone find a solution or workaround regarding this issue? 
Considering a router use case.
I have looked at rt6_stats, total route count is around 78k (full view), 
and around 4100 entries in the cache at the moment on my first router 
(forwarding a few Mb/s) and around 2500 entries on my second router 
(forwarding less than 1 Mb/s).
I have reread the entire thread. At first, Alarig's research seemed to 
lead to a neighbor management problem, my understanding is that route 
cache is something else entirely - or is it related somehow?



On 03/12/2019 19:29, Alarig Le Lay wrote:

We agree then, and I act as a router on all those machines.

Le 3 décembre 2019 19:27:11 GMT+01:00, Vincent Bernat 
 a écrit :


    This is the result of PMTUd. But when you are a router, you don't
    need to do that, so it's mostly a problem for end hosts.

    On December 3, 2019 7:05:49 PM GMT+01:00, Alarig Le Lay
     wrote:

    On 03/12/2019 14:16, Vincent Bernat wrote:

    The information needs to be stored somewhere.


    Why has it to be stored? It’s not really my problem if someone 
else has

    a non-stantard MTU and can’t do TCP-MSS or PMTUd.


Re: IPv6 BGP & kernel 4.19

2020-02-26 Thread Clément Guivy
Hi, did anyone find a solution or workaround regarding this issue? 
Considering a router use case.
I have looked at rt6_stats, total route count is around 78k (full view), 
and around 4100 entries in the cache at the moment on my first router 
(forwarding a few Mb/s) and around 2500 entries on my second router 
(forwarding less than 1 Mb/s).
I have reread the entire thread. At first, Alarig's research seemed to 
lead to a neighbor management problem, my understanding is that route 
cache is something else entirely - or is it related somehow?



On 03/12/2019 19:29, Alarig Le Lay wrote:

We agree then, and I act as a router on all those machines.

Le 3 décembre 2019 19:27:11 GMT+01:00, Vincent Bernat 
 a écrit :


This is the result of PMTUd. But when you are a router, you don't
need to do that, so it's mostly a problem for end hosts.

On December 3, 2019 7:05:49 PM GMT+01:00, Alarig Le Lay
 wrote:

On 03/12/2019 14:16, Vincent Bernat wrote:

The information needs to be stored somewhere.


Why has it to be stored? It’s not really my problem if someone else has
a non-stantard MTU and can’t do TCP-MSS or PMTUd.


Re: IPv6 BGP & kernel 4.19

2019-12-03 Thread Alarig Le Lay
We agree then, and I act as a router on all those machines.

Le 3 décembre 2019 19:27:11 GMT+01:00, Vincent Bernat  a 
écrit :
>This is the result of PMTUd. But when you are a router, you don't need
>to do that, so it's mostly a problem for end hosts.
>
>On December 3, 2019 7:05:49 PM GMT+01:00, Alarig Le Lay
> wrote:
>>On 03/12/2019 14:16, Vincent Bernat wrote:
>>> The information needs to be stored somewhere.
>>
>>Why has it to be stored? It’s not really my problem if someone else
>has
>>a non-stantard MTU and can’t do TCP-MSS or PMTUd.
>>
>>-- 
>>Alarig
>
>-- 
>Sent from my Android device with K-9 Mail. Please excuse my brevity.

-- 
Envoyé de mon appareil Android avec Courriel K-9 Mail. Veuillez excuser ma 
brièveté.

Re: IPv6 BGP & kernel 4.19

2019-12-03 Thread Alarig Le Lay
On 03/12/2019 14:16, Vincent Bernat wrote:
> The information needs to be stored somewhere.

Why has it to be stored? It’s not really my problem if someone else has
a non-stantard MTU and can’t do TCP-MSS or PMTUd.

-- 
Alarig


Re: IPv6 BGP & kernel 4.19

2019-12-03 Thread Vincent Bernat
 ❦  3 décembre 2019 12:48 +01, Alarig Le Lay :

>> It's not unexpected. A cache entry is for a /128.
>
> When I’m routing 80k prefixes I don’t want to have n /128 routes because
> someone doesn’t have 1500 of MTU. Is their a way to disable this
> behaviour?

I don't think there is. The information needs to be stored somewhere.
With IPv6, they are materialized as regular route entries tagged as
"cached routes". With IPv4, they are stored inside a route entry.
-- 
Don't stop with your first draft.
- The Elements of Programming Style (Kernighan & Plauger)


Re: IPv6 BGP & kernel 4.19

2019-12-03 Thread Alarig Le Lay
On 03/12/2019 11:58, Vincent Bernat wrote:
> It's not unexpected. A cache entry is for a /128.

When I’m routing 80k prefixes I don’t want to have n /128 routes because
someone doesn’t have 1500 of MTU. Is their a way to disable this behaviour?

-- 
Alarig


Re: IPv6 BGP & kernel 4.19

2019-12-03 Thread Vincent Bernat
 ❦  3 décembre 2019 11:46 +01, Alarig Le Lay :

> So, I have more routes in cache than in FIB on my two core routers, I’m
> pretty sure there is a bug there :p

It's not unexpected. A cache entry is for a /128.

> I have less routes in cache on 4.14 kernels but more traffic.
>
> I don’t know which function is feeding the cache, but I think that it’s
> doing too much.

The function is ip6_rt_cache_alloc(). It is being called on PMTU
exceptions, on redirects and in this last case I currently fail to
understand:

> ipv6: Create RTF_CACHE clone when FLOWI_FLAG_KNOWN_NH is set
> 
> This patch always creates RTF_CACHE clone with DST_NOCACHE
> when FLOWI_FLAG_KNOWN_NH is set so that the rt6i_dst is set to
> the fl6->daddr.


-- 
It is a wise father that knows his own child.
-- William Shakespeare, "The Merchant of Venice"


Re: IPv6 BGP & kernel 4.19

2019-12-03 Thread Alarig Le Lay
On mar.  3 déc. 09:40:31 2019, Vincent Bernat wrote:
> So, there is 0x56 entries in the cache. Isn't that clear? :)
> 
> https://elixir.bootlin.com/linux/latest/source/net/ipv6/route.c#L6006

I did a quick test on some routers:

core01-arendal, no fullview, on my own ASN, no so much traffic, using
tunnels
https://pix.milkywan.fr/apWaD84h.png
core01-arendal ~ # while :; do awk --non-decimal-data '{ print 
("0x"$6)+0, "(" $6 ")" }' /proc/net/rt6_stats; sleep 120; done
86 (0056)
86 (0056)
86 (0056)
core01-arendal ~ # ip -6 r | wc -l
64
core01-arendal ~ # uname -a
Linux core01-arendal.no.swordarmor.fr 4.19.86-gentoo #1 SMP Mon Dec 2 
19:02:33 CET 2019 x86_64 AMD GX-412TC SOC AuthenticAMD GNU/Linux

core02-arendal, no fullview, on my own ASN, no so much traffic, using
tunnels
https://pix.milkywan.fr/NF3jNY9K.png
core02-arendal ~ # while :; do awk --non-decimal-data '{ print 
("0x"$6)+0, "(" $6 ")" }' /proc/net/rt6_stats; sleep 120; done
28 (001c)
30 (001e)
30 (001e)
core02-arendal ~ # ip -6 r | wc -l
39
core02-arendal ~ # uname -a
Linux core02-arendal.no.swordarmor.fr 4.19.86-gentoo #1 SMP Mon Dec 2 
22:08:21 CET 2019 x86_64 AMD G-T40E Processor AuthenticAMD GNU/Linux

edge01-terrahost, fullview, on my own ASN, no so much traffic, using
one tunnel
https://pix.milkywan.fr/6AVwYkY8.png
edge01-terrahost ~ # while :; do awk --non-decimal-data '{ print 
("0x"$6)+0, "(" $6 ")" }' /proc/net/rt6_stats; sleep 120; done
96 (0060)
101 (0065)
101 (0065)
edge01-terrahost ~ # ip -6 r | wc -l
77439
edge01-terrahost ~ # uname -a
Linux edge01-terrahost.no.swordarmor.fr 4.19.82-gentoo #2 SMP Tue Nov 
12 22:08:28 CET 2019 x86_64 Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz 
GenuineIntel GNU/Linux

edge02-fjordane, fullview, on my own ASN, no so much traffic, using
one tunnel
https://pix.milkywan.fr/J4rOuylq.png
edge02-fjordane ~ # while :; do awk --non-decimal-data '{ print 
("0x"$6)+0, "(" $6 ")" }' /proc/net/rt6_stats; sleep 120; done
110 (006e) 
110 (006e) 
110 (006e)
edge02-fjordane ~ # ip -6 r | wc -l
77433
edge02-fjordane ~ # uname -a
Linux edge02-fjordane.no.swordarmor.fr 4.19.86-gentoo #1 SMP Thu Nov 28 
16:47:53 CET 2019 x86_64 Common KVM processor GenuineIntel GNU/Linux

regis, fullview, on my own ASN, a bit more traffic, using one tunnel
https://pix.milkywan.fr/5XeaK2du.png
regis ~ # while :; do awk --non-decimal-data '{ print ("0x"$6)+0, "(" 
$6 ")" }' /proc/net/rt6_stats; sleep 120; done
0 ()
1 (0001)
1 (0001)
regis ~ # ip -6 r | wc -l
77538
regis ~ # uname -a
Linux regis.swordarmor.fr 4.14.83-gentoo #2 SMP Sat Feb 2 16:50:41 CET 
2019 x86_64 Intel(R) Xeon(R) CPU X3450 @ 2.67GHz GenuineIntel GNU/Linux

asbr02, fullview, on a not-for-profit ASN providing services for others,
100M of traffic, using one tunnel
https://pix.milkywan.fr/l1hfAAIn.png
alarig@asbr02 ~ $ while :; do awk --non-decimal-data '{ print 
("0x"$6)+0, "(" $6 ")" }' /proc/net/rt6_stats; sleep 120; done
4 (0004)
3 (0003)
0 ()
alarig@asbr02 ~ $ ip -6 r | wc -l
77525
alarig@asbr02 ~ $ uname -a
Linux asbr02.cogent-rns.grifon.fr 4.14.156-gentoo #1 SMP Tue Dec 3 
09:53:23 CET 2019 x86_64 Intel(R) Xeon(R) CPU X3450 @ 2.67GHz GenuineIntel 
GNU/Linux


So, I have more routes in cache than in FIB on my two core routers, I’m
pretty sure there is a bug there :p
I have less routes in cache on 4.14 kernels but more traffic.

I don’t know which function is feeding the cache, but I think that it’s
doing too much.

-- 
Alarig


Re: IPv6 BGP & kernel 4.19

2019-12-03 Thread Vincent Bernat
 ❦  3 décembre 2019 08:56 +01, Alarig Le Lay :

>> Just to be clear: I did forget this fact and therefore my initial
>> recommendation to increase max_size with more than 4096 active hosts
>> does not apply anymore (as long as you have a 4.2+ kernel). Keep the
>> default value and watch `/proc/net/rt6_stats`.
>
> core01-arendal ~ # cat /proc/net/rt6_stats
> 0048 002c 5e56 0050  0056 0020
>
> It is supposed to be understandable? :D

So, there is 0x56 entries in the cache. Isn't that clear? :)

https://elixir.bootlin.com/linux/latest/source/net/ipv6/route.c#L6006

-- 
Modularise.  Use subroutines.
- The Elements of Programming Style (Kernighan & Plauger)


Re: IPv6 BGP & kernel 4.19

2019-12-03 Thread Alarig Le Lay
On 02/12/2019 23:04, Vincent Bernat wrote:
> Just to be clear: I did forget this fact and therefore my initial
> recommendation to increase max_size with more than 4096 active hosts
> does not apply anymore (as long as you have a 4.2+ kernel). Keep the
> default value and watch `/proc/net/rt6_stats`.

core01-arendal ~ # cat /proc/net/rt6_stats
0048 002c 5e56 0050  0056 0020

It is supposed to be understandable? :D

-- 
Alarig


Re: IPv6 BGP & kernel 4.19

2019-12-02 Thread Vincent Bernat
 ❦  2 décembre 2019 22:48 +01, Vincent Bernat :

> Also, from 4.2, the cache entries are only created for exceptions (PMTU
> notably). So, in fact, the initial value should be mostly safe. You can
> monitor it with `/proc/net/rt6_stats`. This is the before last value. If
> you can share what you have, I would be curious to know how low it is
> (compared to the 4th entry notably).

Just to be clear: I did forget this fact and therefore my initial
recommendation to increase max_size with more than 4096 active hosts
does not apply anymore (as long as you have a 4.2+ kernel). Keep the
default value and watch `/proc/net/rt6_stats`.
-- 
Program defensively.
- The Elements of Programming Style (Kernighan & Plauger)


Re: IPv6 BGP & kernel 4.19

2019-12-02 Thread Vincent Bernat
 ❦  2 décembre 2019 21:58 +01, Alarig Le Lay :

>> For IPv6, this is the size of the routing cache. If you have more than
>> 4096 active hosts, Linux will aggressively try to run garbage
>> collection, eating CPU. In this case, increase both
>> net.ipv6.route.max_size and net.ipv6.route.gc_thresh.
>
> Do you know what are the risks when we raise those parameters? A bit
> more RAM consumption?

You are mostly safe with RAM. Increasing the value to 512k would eat
256MB of RAM. However, if an attacker is still able to overflow the
cache, it is costly in term of CPU. This is a bit similar to the route
cache for IPv4, so you need to play with threshold, interval and timeout
to try to keep CPU usage down, but ultimately, a fast enough attacker
can do a lot of damage here. I don't have real-life experience with this
aspect.

Also, from 4.2, the cache entries are only created for exceptions (PMTU
notably). So, in fact, the initial value should be mostly safe. You can
monitor it with `/proc/net/rt6_stats`. This is the before last value. If
you can share what you have, I would be curious to know how low it is
(compared to the 4th entry notably).
-- 
Writing is turning one's worst moments into money.
-- J.P. Donleavy


Re: IPv6 BGP & kernel 4.19

2019-12-02 Thread Alarig Le Lay
Hi Vincent,

On lun.  2 déc. 21:38:21 2019, Vincent Bernat wrote:
> For IPv6, this is the size of the routing cache. If you have more than
> 4096 active hosts, Linux will aggressively try to run garbage
> collection, eating CPU. In this case, increase both
> net.ipv6.route.max_size and net.ipv6.route.gc_thresh.

Do you know what are the risks when we raise those parameters? A bit
more RAM consumption?

Regards,
-- 
Alarig


Re: IPv6 BGP & kernel 4.19

2019-12-02 Thread Vincent Bernat
 ❦  1 décembre 2019 19:20 +01, Clément Guivy :

> Hi, that's good news. One thing that still confuses me though is that
> the default values for these settings are the same in Debian 9 (4.9
> kernel) and Debian 10 (4.19 kernel), so I would expect the behaviour
> to be the same between both versions in that regard.
> Also I'm not sure to understand what this max_size parameter actually
> does since I have it to default value (4096), and yet ipv6 route table
> at the moment is >70k entries large without the kernel complaining.

For IPv4, the parameter is ignored since Linux 3.6. For IPv6, this is
the size of the routing cache. If you have more than 4096 active hosts,
Linux will aggressively try to run garbage collection, eating CPU. In
this case, increase both net.ipv6.route.max_size and
net.ipv6.route.gc_thresh. That's a pity, but this value is not easily
observable, so it's hard to know when you hit it. Also, while IPv4
recently got the ability back to enumerate the cache, this is not the
case for IPv6.

This setting is a bit confusing as it is not documented and in the past,
it was limiting the whole IPv6 route table (before Linux 3.0).
-- 
Write clearly - don't sacrifice clarity for "efficiency".
- The Elements of Programming Style (Kernighan & Plauger)


Re: IPv6 BGP & kernel 4.19

2019-12-02 Thread Andrew Hearn
On 01/12/2019 18:20, Clément Guivy wrote:
> On 01/12/2019 13:43, Frederik Kriewitz wrote:
>> This is our current suspicion too. neighbours and routes are well
>> below 4096 in our case. We also had to adjust
>> net.ipv6.neigh.default.gc_thresh1/2/3. Since the adjustment it's been
>> working fine.
>>
> 
> Hi, that's good news. One thing that still confuses me though is that
> the default values for these settings are the same in Debian 9 (4.9
> kernel) and Debian 10 (4.19 kernel), so I would expect the behaviour to
> be the same between both versions in that regard.
> Also I'm not sure to understand what this max_size parameter actually
> does since I have it to default value (4096), and yet ipv6 route table
> at the moment is >70k entries large without the kernel complaining.

To add our info -

We're using Intel 82599ES NICs.

We have full table on v4 and v6, and about 20 neighbors on each.

Our route/max_size for v4 and and v6 are defaults (2M and 4096
respectively) - and as noted, these values are the same in our Stretch
and Buster boxes.

Andrew


Re: IPv6 BGP & kernel 4.19

2019-12-01 Thread Clément Guivy

On 01/12/2019 13:43, Frederik Kriewitz wrote:

This is our current suspicion too. neighbours and routes are well
below 4096 in our case. We also had to adjust
net.ipv6.neigh.default.gc_thresh1/2/3. Since the adjustment it's been
working fine.



Hi, that's good news. One thing that still confuses me though is that 
the default values for these settings are the same in Debian 9 (4.9 
kernel) and Debian 10 (4.19 kernel), so I would expect the behaviour to 
be the same between both versions in that regard.
Also I'm not sure to understand what this max_size parameter actually 
does since I have it to default value (4096), and yet ipv6 route table 
at the moment is >70k entries large without the kernel complaining.


Re: IPv6 BGP & kernel 4.19

2019-12-01 Thread Frederik Kriewitz
On Sun, Dec 1, 2019 at 12:57 PM Daniel Suchy  wrote:
> One idea that comes in my mind is default kernel limit for IPv6 routes
> in memory (sysctl net.ipv6.route.max_size); and such default is quite
> low for fullbgp/DFZ IPv6 deployments and it's still set to 4096 on
> Debian/Buster with stock kernels. Can people having issues with 4.19
> kernels check sysctl mentioned above?

This is our current suspicion too. neighbours and routes are well
below 4096 in our case. We also had to adjust
net.ipv6.neigh.default.gc_thresh1/2/3. Since the adjustment it's been
working fine.


Re: IPv6 BGP & kernel 4.19

2019-12-01 Thread Daniel Suchy
Hello,
I'm running bird 1.6.x branch (packages from Debian/Buster; currently
1.6.6) on recent 4.19 custom-build kernels without any issues (on armhf
hardware).

My BGP sessions are carrying only few routes (default + some more
specifics).

One idea that comes in my mind is default kernel limit for IPv6 routes
in memory (sysctl net.ipv6.route.max_size); and such default is quite
low for fullbgp/DFZ IPv6 deployments and it's still set to 4096 on
Debian/Buster with stock kernels. Can people having issues with 4.19
kernels check sysctl mentioned above?

- Daniel

On 11/21/19 6:12 PM, Ondrej Zajicek wrote:
> On Thu, Nov 21, 2019 at 04:09:24PM +, Andrew Hearn wrote:
>>> Without traffic through the box (all IPv6 prefixes filtered) the bgp
>>> sessions is stable. With traffic the bgp session dies after some time
>>> and ssh connections in the default table freezes.
>>>
>>> I did some packet captures and saw tcp retransmissions before hold timer
>>> expires.
>>>
>>> Kernel 4.14.127 is here stable, too. Sadly I have no time for a kernel
>>> bisect until September. (And no glue where to start and how to trigger
>>> the bug faster.)
>>
>> Sorry to bring up a fairly old thread...
>>
>> We believe we are seeing this problem too, since a Stretch->Buster
>> upgrade - was there a solution to this?
> 
> Perhaps try kernel 5.2.x or 5.3.x from buster-backports?
> 


Re: IPv6 BGP & kernel 4.19

2019-12-01 Thread Benedikt Neuffer
Hi Frederik,

On 30.11.19 23:31, Frederik Kriewitz wrote:
> On Sat, Nov 30, 2019 at 12:26 PM Benedikt Neuffer
>  wrote:
> Which NICs are you using?

We are using Intel X520.

Regards,
Benedikt


-- 
Karlsruher Institut für Technologie (KIT)
Steinbuch Centre for Computing (SCC)

Benedikt Neuffer
Netze und Telekommunikation (NET)

Hermann-von-Helmholtz-Platz 1
Gebäude 442
Raum 185
76344 Eggenstein-Leopoldshafen

Telefon: +49 721 608-24502
Fax: +49 721 608-47763
E-Mail: benedikt.neuf...@kit.edu
Web: https://www.scc.kit.edu



Sitz der Körperschaft:
Kaiserstraße 12, 76131 Karlsruhe



KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft



Signaturversion: 19.1.0 beta



smime.p7s
Description: S/MIME Cryptographic Signature


Re: IPv6 BGP & kernel 4.19

2019-11-30 Thread Alarig Le Lay
On sam. 30 nov. 23:50:48 2019, Alarig Le Lay wrote:
> We are using “Intel Corporation 82576 Gigabit Network Connection” NICs.

And “Broadcom Limited NetXtreme II BCM5709 Gigabit Ethernet”, sorry I
forgot this box.

-- 
Alarig


Re: IPv6 BGP & kernel 4.19

2019-11-30 Thread Alarig Le Lay
On sam. 30 nov. 23:31:39 2019, Frederik Kriewitz wrote:
> We don't know if this might be NIC related yet. We're seeing it happen
> with Intel X710 NICs (With all offloading features disabled). Which
> NICs are you using?

We are using “Intel Corporation 82576 Gigabit Network Connection” NICs.

-- 
Alarig


Re: IPv6 BGP & kernel 4.19

2019-11-30 Thread Frederik Kriewitz
On Sat, Nov 30, 2019 at 12:26 PM Benedikt Neuffer
 wrote:
> as far as I see one need some traffic to reproduce the issue. Without
> traffic I haven't seen the issue.

Yes, we saw this behaviour too using the buster kernel. It seems to be
traffic and/or neighbours related.
Forwarding itself seems to work but neighbour discovery stops working
(that's why multicast based OSPF sessions are not affected).
In this state the kernel doesn't generate any neighbor solicitation
packets (not visible using tcpdump). Once the neighbour cache times
out IPv6 connectivity is broken.

We don't know if this might be NIC related yet. We're seeing it happen
with Intel X710 NICs (With all offloading features disabled). Which
NICs are you using?
Resetting the NIC using ethtool -r $INTERFACE seems to have fixed it
once for us. The problem fixes itself after ~ 90 to 110 minutes too
until it appears again.


Re: IPv6 BGP & kernel 4.19

2019-11-30 Thread Alarig Le Lay
I saw it in production with ~20 VMs, but I don’t know how much is needed
to trigger it.

On sam. 30 nov. 11:43:29 2019, Stefan Jakob wrote:
> Can anyone provide test configs?
> 
> Is it testable inside two or three VMs?
> 
> Could offer 5.3.X tests here.
> 
> On Sat, Nov 23, 2019 at 6:48 PM Alarig Le Lay  wrote:
> >
> > On jeu. 21 nov. 18:12:17 2019, Ondrej Zajicek wrote:
> > > Perhaps try kernel 5.2.x or 5.3.x from buster-backports?
> >
> > I’m very interested by test results from newer kernels than 5.0.x
> >
> > --
> > Alarig


Re: IPv6 BGP & kernel 4.19

2019-11-30 Thread Benedikt Neuffer
Hi all,

On 30.11.19 11:43, Stefan Jakob wrote:
> Can anyone provide test configs?
> 
> Is it testable inside two or three VMs?
> 
> Could offer 5.3.X tests here.
> 
> On Sat, Nov 23, 2019 at 6:48 PM Alarig Le Lay  wrote:
>>
>> On jeu. 21 nov. 18:12:17 2019, Ondrej Zajicek wrote:
>>> Perhaps try kernel 5.2.x or 5.3.x from buster-backports?
>>
>> I’m very interested by test results from newer kernels than 5.0.x
>>
>> --
>> Alarig
> 

as far as I see one need some traffic to reproduce the issue. Without
traffic I haven't seen the issue.

Regards,
Benedikt

-- 
Karlsruher Institut für Technologie (KIT)
Steinbuch Centre for Computing (SCC)

Benedikt Neuffer
Netze und Telekommunikation (NET)

Hermann-von-Helmholtz-Platz 1
Gebäude 442
Raum 185
76344 Eggenstein-Leopoldshafen

Telefon: +49 721 608-24502
Fax: +49 721 608-47763
E-Mail: benedikt.neuf...@kit.edu
Web: https://www.scc.kit.edu



Sitz der Körperschaft:
Kaiserstraße 12, 76131 Karlsruhe



KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft



Signaturversion: 19.1.0 beta



smime.p7s
Description: S/MIME Cryptographic Signature


Re: IPv6 BGP & kernel 4.19

2019-11-30 Thread Stefan Jakob
Can anyone provide test configs?

Is it testable inside two or three VMs?

Could offer 5.3.X tests here.

On Sat, Nov 23, 2019 at 6:48 PM Alarig Le Lay  wrote:
>
> On jeu. 21 nov. 18:12:17 2019, Ondrej Zajicek wrote:
> > Perhaps try kernel 5.2.x or 5.3.x from buster-backports?
>
> I’m very interested by test results from newer kernels than 5.0.x
>
> --
> Alarig



Re: IPv6 BGP & kernel 4.19

2019-11-23 Thread Alarig Le Lay
On jeu. 21 nov. 18:12:17 2019, Ondrej Zajicek wrote:
> Perhaps try kernel 5.2.x or 5.3.x from buster-backports?

I’m very interested by test results from newer kernels than 5.0.x

-- 
Alarig


Re: IPv6 BGP & kernel 4.19

2019-11-21 Thread Ondrej Zajicek
On Thu, Nov 21, 2019 at 04:09:24PM +, Andrew Hearn wrote:
> > Without traffic through the box (all IPv6 prefixes filtered) the bgp
> > sessions is stable. With traffic the bgp session dies after some time
> > and ssh connections in the default table freezes.
> > 
> > I did some packet captures and saw tcp retransmissions before hold timer
> > expires.
> > 
> > Kernel 4.14.127 is here stable, too. Sadly I have no time for a kernel
> > bisect until September. (And no glue where to start and how to trigger
> > the bug faster.)
> 
> Sorry to bring up a fairly old thread...
> 
> We believe we are seeing this problem too, since a Stretch->Buster
> upgrade - was there a solution to this?

Perhaps try kernel 5.2.x or 5.3.x from buster-backports?

-- 
Elen sila lumenn' omentielvo

Ondrej 'Santiago' Zajicek (email: santi...@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."


Re: IPv6 BGP & kernel 4.19

2019-11-21 Thread Alarig Le Lay
Hi,

On 21/11/2019 17:46, Benedikt Neuffer wrote:
> Hi Andrew,
> 
> On 21.11.19 17:09, Andrew Hearn wrote:
>> Sorry to bring up a fairly old thread...
>>
>> We believe we are seeing this problem too, since a Stretch->Buster
>> upgrade - was there a solution to this?
>>
>> Thanks
> 
> The problem still exists. We are still running on kernel 4.14.x. I had
> no time to do any further debugging.
> 
> Regards,
> Benedikt
> 
> 

I also had the problem with 5.x on proxmox 6. But I didn’t begin my
debugging either, E_NOTIME…

-- 
Alarig


Re: IPv6 BGP & kernel 4.19

2019-11-21 Thread Benedikt Neuffer
Hi Andrew,

On 21.11.19 17:09, Andrew Hearn wrote:
> Sorry to bring up a fairly old thread...
> 
> We believe we are seeing this problem too, since a Stretch->Buster
> upgrade - was there a solution to this?
> 
> Thanks

The problem still exists. We are still running on kernel 4.14.x. I had
no time to do any further debugging.

Regards,
Benedikt


-- 
Karlsruher Institut für Technologie (KIT)
Steinbuch Centre for Computing (SCC)

Benedikt Neuffer
Netze und Telekommunikation (NET)

Hermann-von-Helmholtz-Platz 1
Gebäude 442
Raum 185
76344 Eggenstein-Leopoldshafen

Telefon: +49 721 608-24502
Fax: +49 721 608-47763
E-Mail: benedikt.neuf...@kit.edu
Web: https://www.scc.kit.edu



Sitz der Körperschaft:
Kaiserstraße 12, 76131 Karlsruhe



KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft



Signaturversion: 19.1.0 beta



smime.p7s
Description: S/MIME Cryptographic Signature


Re: IPv6 BGP & kernel 4.19

2019-11-21 Thread Andrew Hearn
On 20/06/2019 17:13, Benedikt Neuffer wrote:
> Hi,
> 
> On 19.06.19 20:09, Alarig Le Lay wrote:
>> Hi,
>>
>> On mer. 19 juin 09:10:53 2019, Robert Sander wrote:
>>> Hi,
>>>
>>> our routers run on Debian stretch with bird 1.6.4 from
>>> bird.network.cz/debian.
>>>
>>> Yesterday I tried kernel 4.19 from backports.debian.org and ran into a
>>> weird issue with IPv6 BGP sessions:
>>>
>>> All Peerings reported "Error: Hold timer expired" ca. every 40 minutes.
>>>
>>> IPv6 forwarding was flapping all the time.
>>>
>>> After rebooting into kernel 4.9 everything worked again.
>>>
>>> IPv4 BGP was not affected and also OSPF (v4 and v6). I could disable all
>>> IPv6 BGP peerings on this router and then it forwarded to another router
>>> learned via OSPF for IPv6 without issues.
>>>
>>> Has anyone seen such a behaviour?
>>
>> I’ve seen this with 4.19 on gentoo. For now I’m still running 4.14.
>> https://archives.gentoo.org/gentoo-user/message/fab628cc53e4a55589410f9dff6abd23
>>
> 
> Same here. Gentoo, Linux 4.19.52, Bird 2.0.4. I am running a full table
> using a separate VRF and the default table as management VRF.
> 
> Without traffic through the box (all IPv6 prefixes filtered) the bgp
> sessions is stable. With traffic the bgp session dies after some time
> and ssh connections in the default table freezes.
> 
> I did some packet captures and saw tcp retransmissions before hold timer
> expires.
> 
> Kernel 4.14.127 is here stable, too. Sadly I have no time for a kernel
> bisect until September. (And no glue where to start and how to trigger
> the bug faster.)

Sorry to bring up a fairly old thread...

We believe we are seeing this problem too, since a Stretch->Buster
upgrade - was there a solution to this?

Thanks

Andrew.



Re: IPv6 BGP & kernel 4.19

2019-06-20 Thread Benedikt Neuffer
Hi,

On 19.06.19 20:09, Alarig Le Lay wrote:
> Hi,
> 
> On mer. 19 juin 09:10:53 2019, Robert Sander wrote:
>> Hi,
>>
>> our routers run on Debian stretch with bird 1.6.4 from
>> bird.network.cz/debian.
>>
>> Yesterday I tried kernel 4.19 from backports.debian.org and ran into a
>> weird issue with IPv6 BGP sessions:
>>
>> All Peerings reported "Error: Hold timer expired" ca. every 40 minutes.
>>
>> IPv6 forwarding was flapping all the time.
>>
>> After rebooting into kernel 4.9 everything worked again.
>>
>> IPv4 BGP was not affected and also OSPF (v4 and v6). I could disable all
>> IPv6 BGP peerings on this router and then it forwarded to another router
>> learned via OSPF for IPv6 without issues.
>>
>> Has anyone seen such a behaviour?
> 
> I’ve seen this with 4.19 on gentoo. For now I’m still running 4.14.
> https://archives.gentoo.org/gentoo-user/message/fab628cc53e4a55589410f9dff6abd23
> 

Same here. Gentoo, Linux 4.19.52, Bird 2.0.4. I am running a full table
using a separate VRF and the default table as management VRF.

Without traffic through the box (all IPv6 prefixes filtered) the bgp
sessions is stable. With traffic the bgp session dies after some time
and ssh connections in the default table freezes.

I did some packet captures and saw tcp retransmissions before hold timer
expires.

Kernel 4.14.127 is here stable, too. Sadly I have no time for a kernel
bisect until September. (And no glue where to start and how to trigger
the bug faster.)

Regards
Bene


-- 
Karlsruher Institute of Technology (KIT)
Steinbuch Centre for Computing (SCC)

Benedikt Neuffer
Netze und Telekommunikation (NET)

Hermann-von-Helmholtz-Platz 1
Gebäude 442
Raum 185
76344 Eggenstein-Leopoldshafen

Telefon: +49 721 608-24502
Fax: +49 721 608-47763
E-Mail: benedikt.neuf...@kit.edu
Web: https://www.scc.kit.edu



Sitz der Körperschaft:
Kaiserstraße 12, 76131 Karlsruhe



KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft



Signaturversion: 19.1.0 beta




smime.p7s
Description: S/MIME Cryptographic Signature


Re: IPv6 BGP & kernel 4.19

2019-06-19 Thread Alarig Le Lay
Hi,

On mer. 19 juin 09:10:53 2019, Robert Sander wrote:
> Hi,
> 
> our routers run on Debian stretch with bird 1.6.4 from
> bird.network.cz/debian.
> 
> Yesterday I tried kernel 4.19 from backports.debian.org and ran into a
> weird issue with IPv6 BGP sessions:
> 
> All Peerings reported "Error: Hold timer expired" ca. every 40 minutes.
> 
> IPv6 forwarding was flapping all the time.
> 
> After rebooting into kernel 4.9 everything worked again.
> 
> IPv4 BGP was not affected and also OSPF (v4 and v6). I could disable all
> IPv6 BGP peerings on this router and then it forwarded to another router
> learned via OSPF for IPv6 without issues.
> 
> Has anyone seen such a behaviour?

I’ve seen this with 4.19 on gentoo. For now I’m still running 4.14.
https://archives.gentoo.org/gentoo-user/message/fab628cc53e4a55589410f9dff6abd23

-- 
Alarig