Re: Graceful shutdown request signal

2023-06-20 Thread Vincent Bernat

On 2023-06-20 21:07, Maria Matejka via Bird-users wrote:
Well, it's a pity that systemd doesn't allow for custom operations – in 
such case you could call "systemd graceful bird2" or "systemd restart 
bird2"…


It's still possible to use systemctl kill --signal=USR2 bird2


Re: BIRD Crashes

2022-08-19 Thread Vincent Bernat

On 2022-08-19 10:24, Ian Chilton wrote:

It doesn't - it just shows the test I did by killing sleep, which is the 
only thing that `coredumpctl list` shows (and there is only that one 
file in /var/lib/systemd/coredump/).


We start BIRD from systemd, with some (custom) unit files.

Did you just install systemd-coredump and now it creates core dumps for 
all processes, including your bird processes without any further changes 
(that's what I read implied).


I have systemd-coredump since quite some time, and yes, it creates core 
dumps for all processes with a few exceptions (ptraced processes for 
example). I am using its default configuration.


Re: BIRD Crashes

2022-08-18 Thread Vincent Bernat

On 2022-08-18 17:57, Ian Chilton wrote:
When the crash happened again yesterday, I hoped to have a core file to 
send, but there is no sign of it having generated one :(


This works for me. What is "coredumpctl" saying about the crash 
("coredumpctl info -1")? If you installed bird from a package, you may 
also want to install bird-dbgsym to help debugging (but this is not 
necessary to get the coredump).


Re: Let packets from different BGP go to different routing tables

2022-07-13 Thread Vincent Bernat

On 2022-07-13 08:08, Brandon Zhi wrote:

We created a bgp_v6 (IBGP) session on tunnel1 that allows downstream BGP 
sessions like HE(Hurricane Electric) and put the routing table into 
table 147.


Create bgp_v6_own(IBGP) on tunnel2 to transmit those routing tables from 
BGP that cannot carry downstream to Table 247


You can use the pipe protocol to copy some routes from one table to the 
other. So, you'll need one table to receive routes from BGP, then you 
can have two "pipe" protocols to copy them on table 147 and table 247.


[PATCH] Doc: fix mating -> matching in flowspec section

2022-04-20 Thread Vincent Bernat
---
 doc/bird.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/bird.sgml b/doc/bird.sgml
index 9c4a6f68a447..1580facd8155 100644
--- a/doc/bird.sgml
+++ b/doc/bird.sgml
@@ -5251,7 +5251,7 @@ Note that for negated matches, value must be either zero 
or equal to bitmask
port 1..1023,1194,3306).
 
dport 
-   Set a mating destination port numbers (e.g. dport 49151).
+   Set a matching destination port numbers (e.g. dport 49151).
 
sport 
Set a matching source port numbers (e.g. sport = 0).
-- 
2.35.2



Re: [PATCH] Lib: accept 240.0.0.0/4 as a valid range

2022-03-16 Thread Vincent Bernat
 ❦ 16 March 2022 20:21 +01, Ondrej Zajicek:

> Updated BIRD to accept 240/4 as 'site' scope. We went with slightly
> different patch:
>
> https://gitlab.nic.cz/labs/bird/-/commit/269bfff9bf4b2349248bb48ff61009cf1c5a4aec

Not related, but there is a .orig file lying around in Git.
-- 
October 12, the Discovery.

It was wonderful to find America, but it would have been more wonderful to miss
it.
-- Mark Twain, "Pudd'nhead Wilson's Calendar"


[PATCH v2] Lib: accept 240.0.0.0/4 as a valid range

2022-03-14 Thread Vincent Bernat
240.0.0.0/4 is marked as reserved and considered invalid by BIRD. At
work, we are using this range internally since all RFC 1918 are full
and 100.64.0.0/10 is already used too. BIRD complains loudly for each
interface using this range.

This change makes it possible to use this range. I have used scope
"universe". But I would be happy with "site" too. While widely
discussed, I don't think 240/4 will become routable on the Internet
one day.

As a bonus, I added some comments and unrolled a condition for each
block. I also have added some hints for the compiler to avoid using
jumps in the hotpath (tested on Godbolt, see
https://godbolt.org/z/rGjz336K3).
---
 lib/ip.c| 28 +++-
 sysdep/config.h |  5 +
 2 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/lib/ip.c b/lib/ip.c
index fcc72cafb4de..4d0dff636e17 100644
--- a/lib/ip.c
+++ b/lib/ip.c
@@ -85,25 +85,19 @@ ip4_classify(ip4_addr ad)
   u32 a = _I(ad);
   u32 b = a >> 24U;
 
-  if (b && b <= 0xdf)
-  {
-if (b == 0x7f)
-  return IADDR_HOST | SCOPE_HOST;
-else if ((b == 0x0a) ||
-((a & 0x) == 0xc0a8) ||
-((a & 0xfff0) == 0xac10))
-  return IADDR_HOST | SCOPE_SITE;
-else
-  return IADDR_HOST | SCOPE_UNIVERSE;
-  }
-
-  if (b >= 0xe0 && b <= 0xef)
+  if (unlikely(b == 0x00))
+return IADDR_INVALID;   /* 0.0.0.0/8   This network */
+  if (unlikely(b == 0x7f))  /* 127.0.0.0/8 Loopback */
+return IADDR_HOST | SCOPE_HOST;
+  if ((b == 0x0a) ||/* 10.0.0.0/8  Private-use */
+  ((a & 0x) == 0xc0a8) ||   /* 192.168.0.0/16  Private-use */
+  ((a & 0xfff0) == 0xac10)) /* 172.16.0.0/12   Private-use */
+return IADDR_HOST | SCOPE_SITE;
+  if (unlikely(b >= 0xe0 && b <= 0xef)) /* 224.0.0.0/4 Multicast */
 return IADDR_MULTICAST | SCOPE_UNIVERSE;
-
-  if (a == 0x)
+  if (unlikely(a == 0x))/* 255.255.255.255 Limited 
broadcast */
 return IADDR_BROADCAST | SCOPE_LINK;
-
-  return IADDR_INVALID;
+  return IADDR_HOST | SCOPE_UNIVERSE;
 }
 
 int
diff --git a/sysdep/config.h b/sysdep/config.h
index b0531844af9f..4d73543c3894 100644
--- a/sysdep/config.h
+++ b/sysdep/config.h
@@ -30,6 +30,11 @@
  */
 #include "sysdep/paths.h"
 
+/* Likely/unlikely macros */
+
+#define likely(x)   __builtin_expect((x),1)
+#define unlikely(x) __builtin_expect((x),0)
+
 /* Types */
 
 #include 
-- 
2.35.1



[PATCH] Lib: accept 240.0.0.0/4 as a valid range

2022-03-14 Thread Vincent Bernat
240.0.0.0/4 is marked as reserved and considered invalid by BIRD. At
work, we are using this range internally since all RFC 1918 are full
and 100.64.0.0/10 is already used too. BIRD complains loudly for each
interface using this range.

This change makes it possible to use this range. I have used scope
"universe". But I would be happy with "site" too. While widely
discussed, I don't think 240/4 will become routable on the Internet
one day.

As a bonus, I added some comments and unrolled a condition for each
block. I also have added some hints for the compiler to avoid using
jumps in the hotpath (tested on Godbolt, see
https://godbolt.org/z/rGjz336K3).
---
 lib/ip.c| 31 ---
 sysdep/config.h |  5 +
 2 files changed, 17 insertions(+), 19 deletions(-)

diff --git a/lib/ip.c b/lib/ip.c
index fcc72cafb4de..8f0f32d25d61 100644
--- a/lib/ip.c
+++ b/lib/ip.c
@@ -80,30 +80,23 @@ ip6_masklen(ip6_addr *a)
 }
 
 int
-ip4_classify(ip4_addr ad)
+ip4_classify(u32 a)
 {
-  u32 a = _I(ad);
   u32 b = a >> 24U;
 
-  if (b && b <= 0xdf)
-  {
-if (b == 0x7f)
-  return IADDR_HOST | SCOPE_HOST;
-else if ((b == 0x0a) ||
-((a & 0x) == 0xc0a8) ||
-((a & 0xfff0) == 0xac10))
-  return IADDR_HOST | SCOPE_SITE;
-else
-  return IADDR_HOST | SCOPE_UNIVERSE;
-  }
-
-  if (b >= 0xe0 && b <= 0xef)
+  if (unlikely(b == 0x00))
+return IADDR_INVALID;   /* 0.0.0.0/8   This network */
+  if (unlikely(b == 0x7f))  /* 127.0.0.0/8 Loopback */
+return IADDR_HOST | SCOPE_HOST;
+  if ((b == 0x0a) ||/* 10.0.0.0/8  Private-use */
+  ((a & 0x) == 0xc0a8) ||   /* 192.168.0.0/16  Private-use */
+  ((a & 0xfff0) == 0xac10)) /* 172.16.0.0/12   Private-use */
+return IADDR_HOST | SCOPE_SITE;
+  if (unlikely(b >= 0xe0 && b <= 0xef)) /* 224.0.0.0/4 Multicast */
 return IADDR_MULTICAST | SCOPE_UNIVERSE;
-
-  if (a == 0x)
+  if (unlikely(a == 0x))/* 255.255.255.255 Limited 
broadcast */
 return IADDR_BROADCAST | SCOPE_LINK;
-
-  return IADDR_INVALID;
+  return IADDR_HOST | SCOPE_UNIVERSE;
 }
 
 int
diff --git a/sysdep/config.h b/sysdep/config.h
index b0531844af9f..4d73543c3894 100644
--- a/sysdep/config.h
+++ b/sysdep/config.h
@@ -30,6 +30,11 @@
  */
 #include "sysdep/paths.h"
 
+/* Likely/unlikely macros */
+
+#define likely(x)   __builtin_expect((x),1)
+#define unlikely(x) __builtin_expect((x),0)
+
 /* Types */
 
 #include 
-- 
2.35.1



Re: birdc configure exits with 0 on error

2022-03-10 Thread Vincent Bernat
 ❦ 11 March 2022 04:07 +01, Ondrej Zajicek:

> With this, we likely do not need "ExecReload=/usr/sbin/bird -p" from your 
> patch?
> (But the remaining changes for RPM are ok.)

Yes. Do you want another patch?
-- 
Things past redress and now with me past care.
-- William Shakespeare, "Richard II"


birdc configure exits with 0 on error

2022-03-10 Thread Vincent Bernat
Hey!

"birdc configure" (or any command in fact) exits with 0 on error. This
is a bit annoying as when using "systemctl reload bird", we get no
notification there is an error.

Looking at the source code, it seems there is no easy way to hack around
that. Commands do not report an error code and messages printed are not
tagged as errors.

Here is a small workaround for systemd:

>From 5cbc487c4d54033c688258a5361377f72f53c264 Mon Sep 17 00:00:00 2001
From: Vincent Bernat 
Date: Thu, 10 Mar 2022 19:36:53 +0100
Subject: [PATCH] Pkg: check configuration before reloading with systemd

Also, update the RPM version to use "birdc configure" instead of "kill
-HUP".
---
 distro/pkg/deb/bird2.bird.service | 1 +
 distro/pkg/rpm/bird.service   | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/distro/pkg/deb/bird2.bird.service b/distro/pkg/deb/bird2.bird.service
index 37e75fb41c6a..ae48f7f46cfa 100644
--- a/distro/pkg/deb/bird2.bird.service
+++ b/distro/pkg/deb/bird2.bird.service
@@ -6,6 +6,7 @@ After=network.target
 EnvironmentFile=/etc/bird/envvars
 ExecStartPre=/usr/lib/bird/prepare-environment
 ExecStartPre=/usr/sbin/bird -p
+ExecReload=/usr/sbin/bird -p
 ExecReload=/usr/sbin/birdc configure
 ExecStart=/usr/sbin/bird -f -u $BIRD_RUN_USER -g $BIRD_RUN_GROUP $BIRD_ARGS
 Restart=on-abort
diff --git a/distro/pkg/rpm/bird.service b/distro/pkg/rpm/bird.service
index fa203c781905..aa6e12dc5489 100644
--- a/distro/pkg/rpm/bird.service
+++ b/distro/pkg/rpm/bird.service
@@ -5,8 +5,10 @@ After=network.target
 
 [Service]
 Type=simple
+ExecStartPre=/usr/sbin/bird -p
 ExecStart=/usr/sbin/bird -f -u bird -g bird
-ExecReload=/bin/kill -HUP $MAINPID
+ExecReload=/usr/sbin/bird -p
+ExecReload=/usr/sbin/birdc configure
 Restart=on-failure
 
 [Install]
-- 
2.35.1

-- 
Keep it simple to make it faster.
- The Elements of Programming Style (Kernighan & Plauger)


Re: Bird 2 config syntax

2021-09-21 Thread Vincent Bernat
 ❦ 21 September 2021 15:23 +02, Robert Sander:

> why is it not possible to define BGP sessions like this?
>
> protocol bgp g60 {
>   local as 64501;
>   neighbor as 64499;
>   ipv4 {
>   import all;
>   export all;
>   };
>   local 192.0.2.4;
>   neighbor 192.0.2.5;
>   };
>   ipv6 {
>   import all;
>   export all;
>   local 2001:db8::4;
>   neighbor 2001:db8::5;
>   };
> };
>
> It seems still necessary to write two bgp definitions separately for
> IPv4 and IPv6 repeating the AS numbers and name.
>
> I see no benefit from the Bird 1.6 notation here.

With BIRD2, you can transport your IPv4 routes over an IPv6 BGP session.
To not repeat stuff, you can use templates.
-- 
Localise input and output in subroutines.
- The Elements of Programming Style (Kernighan & Plauger)


Re: BIRD 2.0.8

2021-04-08 Thread Vincent Bernat
 ❦  8 avril 2021 11:00 +02, Peter Hurtenbach:

> Below are my steps I've done to import the new upstream version and
> build the package with git-buildpackage on a Debian Buster machine (or 
> chroot).

A simpler alternative:

apt install git-buildpackage pristine-tar pbuilder 
DIST=buster git-pbuilder create

gbp clone https://salsa.debian.org/debian/bird2.git
cd bird2
gbp import-orig --uscan
(do what you did with the patches)
dch -v 2.0.8-0
git commit -a -m "New upstream version"
gbp buildpackage --git-pbuilder --git-dist=buster
-- 
But, for my own part, it was Greek to me.
-- William Shakespeare, "Julius Caesar"


Re: BIRD 2.0.8

2021-03-23 Thread Vincent Bernat
 ❦ 23 mars 2021 19:02 +01, Ondrej Zajicek:

>> Never mind, I found that I should also read the notes :D
>>   Notes:
>> 
>>   Automatic channel reloads based on RPKI changes are enabled by default,
>>   but require import table enabled when used in BGP import filter.
>> Looks like this did the trick!

> Yes, you should also get warning ("Automatic RPKI reload not active for 
> import")
> when run without import table enabled.

Hey!

How does `import table yes` interacts with `import keep filtered`? They
look like the same, except the former is BGP specific.
-- 
It usually takes more than three weeks to prepare a good impromptu speech.
-- Mark Twain


Re: [PATCH] BGP: Add support for BGP hostname capability

2021-02-05 Thread Vincent Bernat
 ❦  3 février 2021 23:37 +01, Ondrej Zajicek:

> 1) Hostname has similar role like router id. It seems to me that there
> should be global hostname config property, like config->router_id,
> initialized like router_id in global_commit().

Done. Use `hostname "bla"`

> 2) Consequently, there should be global option to set it, and it should
> report in 'show status'.

Done.

> 3) Function calling uname() should be probably somewhere in
> sysdep/unix/ code

Not done. I don't know exactly where I would put it and it would be a
trivial wrapper.

> 4) The hostname field in bgp_caps should be ptr, allocated based on
> length, instead of preallocatted full-length buffer.

Done. I think I have used the pools correctly.

> 5) As Job Snijders wrote, the capability should be disabled by
> default.

Done. Can be enabled with `advertise hostname yes`.

> 6) Received hostname should be treated for unsafe characters like text
> from RFC 8203 (shutdown communication), see bgp_handle_message().

Done.

Also, reload works, but as I didn't write anything for that, it's a bit
a mystery how BIRD knows there is a change.


[PATCH v2] BGP: Add support for BGP hostname capability

2021-02-05 Thread Vincent Bernat
This is an implementation of draft-walton-bgp-hostname-capability-02.
It's implemented since quite some time for FRR and in datacenter, this
gives a nice output to avoid using IP addresses.

It is disabled by default. The hostname is retrieved from uname(2) and
can be overriden with "hostname". The domain name is never set nor
displayed. I don't think the domain name is useful.

The return code of uname(2) is not checked, but I think this is
not bound to happen and if it happens, this is handled nicely by
disabling the capability.

The draft is expired but I hope a second implementation could allow to
revive it.

Output from BIRD:

Local capabilities
  Multiprotocol
AF announced: ipv6
  Route refresh
  Graceful restart
  4-octet AS numbers
  Enhanced refresh
  Long-lived graceful restart
  Hostname: bird2
Neighbor capabilities
  Multiprotocol
AF announced: ipv6
  Route refresh
  Graceful restart
  4-octet AS numbers
  ADD-PATH
RX: ipv6
TX:
  Hostname: frr1

Output from FRR:

NeighborV AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  
Up/Down State/PfxRcd   PfxSnt
bird1(2001:db8::11) 4  65000   548   495000 
00:03:0600
bird2(2001:db8::12) 4  65000   281   256000 
00:01:4900
frr2(2001:db8::22)  4  65000   501   501000 
00:24:5600

Signed-off-by: Vincent Bernat 
---
 conf/conf.c | 14 ++
 conf/conf.h |  1 +
 doc/bird.sgml   |  6 ++
 nest/cmds.c |  1 +
 nest/config.Y   |  6 +-
 proto/bgp/bgp.c |  3 +++
 proto/bgp/bgp.h |  2 ++
 proto/bgp/config.Y  |  2 ++
 proto/bgp/packets.c | 36 
 9 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/conf/conf.c b/conf/conf.c
index 6f64b5416c59..1b699c97361e 100644
--- a/conf/conf.c
+++ b/conf/conf.c
@@ -42,6 +42,7 @@
 
 #include 
 #include 
+#include 
 
 #undef LOCAL_DEBUG
 
@@ -217,6 +218,19 @@ config_del_obstacle(struct config *c)
 static int
 global_commit(struct config *new, struct config *old)
 {
+  if (!new->hostname)
+{
+  struct utsname uts = {};
+  if (uname() == -1)
+log(L_WARN "Cannot determine hostname");
+  else
+  {
+char *hostname;
+hostname = lp_strdup(new->mem, uts.nodename);
+new->hostname = hostname;
+  }
+}
+
   if (!old)
 return 0;
 
diff --git a/conf/conf.h b/conf/conf.h
index 3e47c9185469..860d267aa7b9 100644
--- a/conf/conf.h
+++ b/conf/conf.h
@@ -40,6 +40,7 @@ struct config {
   struct timeformat tf_log;/* Time format for the logfile */
   struct timeformat tf_base;   /* Time format for other purposes */
   u32 gr_wait; /* Graceful restart wait timeout (sec) 
*/
+  const char *hostname;/* Hostname */
 
   int cli_debug;   /* Tracing of CLI connections and 
commands */
   int latency_debug;   /* I/O loop tracks duration of each 
event */
diff --git a/doc/bird.sgml b/doc/bird.sgml
index 28b0e400b464..592a76a294ea 100644
--- a/doc/bird.sgml
+++ b/doc/bird.sgml
@@ -585,6 +585,9 @@ include "tablename.conf";;
See  section for detailed
description of interface patterns with extended clauses.
 
+   hostname "
+   Set hostname. Default: node name as returned by 'uname -n'.
+
graceful restart wait 

During graceful restart recovery, BIRD waits for convergence of routing
protocols. This option allows to specify a timeout for the recovery to
@@ -2536,6 +2539,9 @@ using the following configuration parameters:
This option is relevant to IPv4 mode with enabled capability
advertisement only. Default: on.
 
+   advertise hostname 

+   Advertise hostname capability along with the hostname. Default: off.
+
disable after error 

When an error is encountered (either locally or by the other side),
disable the instance automatically and wait for an administrator to fix
diff --git a/nest/cmds.c b/nest/cmds.c
index da4015cfba0d..18f39eb56c10 100644
--- a/nest/cmds.c
+++ b/nest/cmds.c
@@ -27,6 +27,7 @@ cmd_show_status(void)
   cli_msg(-1000, "BIRD " BIRD_VERSION);
   tm_format_time(tim, >tf_base, current_time());
   cli_msg(-1011, "Router ID is %R", config->router_id);
+  cli_msg(-1011, "Hostname is %s", config->hostname);
   cli_msg(-1011, "Current server time is %s", tim);
   tm_format_time(tim, >tf_base, boot_time);
   cli_msg(-1011, "Last reboot on %s", tim);
diff --git a/nest/config.Y b/nest/config.Y
index 0bb8ca51dba0..39bf61490001 100644
--- a/nest/config.Y
+++ b/nest/config.Y
@@ -87,7 +87,7 @@ proto_postconfig(void)
 
 CF_DECLS
 
-CF_KE

Re: [PATCH] BGP: Add support for BGP hostname capability

2021-02-03 Thread Vincent Bernat
 ❦  3 février 2021 21:27 +01, Job Snijders:

> I recommend adjusting the patch in such a way that the capability is
> only exchanged with specific neighbors where the capability has been
> explicitly enabled through neighbor/group specific configuration.

Yes, I'll do that. I am also OK with the patch staying just a patch if
we feel the draft is unlikely to get consensus and become a draft. I
have pinged Dinesh on Twitter and he told he may renew the draft.
-- 
Don't compare floating point numbers just for equality.
- The Elements of Programming Style (Kernighan & Plauger)


[PATCH] BGP: Add support for BGP hostname capability

2021-02-03 Thread Vincent Bernat
This is an implementation of draft-walton-bgp-hostname-capability-02.
It's implemented since quite some time for FRR and in datacenter, this
gives a nice output to avoid using IP addresses.

Currently, it is always enabled. The hostname is retrieved from
uname(2) and not configurable. The domain name is never set nor
displayed. I don't think being able to set the hostname is very
important. I don't think the domain name is useful. However, maybe the
capability should not be enabled by default.

Also, the case where the capability is supported but hostname is empty
is handled as if the capability was not supported. I think this is not
worth handling such an edge case. The RFC doesn't say if hostname is
optional or not.

The return code of uname(2) is not checked, but I think this is
not bound to happen and if it happens, this is handled nicely by
disabling the capability.

The draft is expired but I hope a second implementation could allow to
revive it.

Output from BIRD:

Local capabilities
  Multiprotocol
AF announced: ipv6
  Route refresh
  Graceful restart
  4-octet AS numbers
  Enhanced refresh
  Long-lived graceful restart
  Hostname: bird2
Neighbor capabilities
  Multiprotocol
AF announced: ipv6
  Route refresh
  Graceful restart
  4-octet AS numbers
  ADD-PATH
RX: ipv6
TX:
  Hostname: frr1

Output from FRR:

NeighborV AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  
Up/Down State/PfxRcd   PfxSnt
bird1(2001:db8::11) 4  65000   548   495000 
00:03:0600
bird2(2001:db8::12) 4  65000   281   256000 
00:01:4900
frr2(2001:db8::22)  4  65000   501   501000 
00:24:5600

Signed-off-by: Vincent Bernat 
---
 proto/bgp/bgp.c |  3 +++
 proto/bgp/bgp.h |  2 ++
 proto/bgp/config.Y  |  1 +
 proto/bgp/packets.c | 38 ++
 4 files changed, 44 insertions(+)

diff --git a/proto/bgp/bgp.c b/proto/bgp/bgp.c
index 302d026c0103..d4c15f0538b1 100644
--- a/proto/bgp/bgp.c
+++ b/proto/bgp/bgp.c
@@ -2415,6 +2415,9 @@ bgp_show_capabilities(struct bgp_proto *p UNUSED, struct 
bgp_caps *caps)
 bgp_show_afis(-1006, "AF supported:", afl1, afn1);
 bgp_show_afis(-1006, "AF preserved:", afl2, afn2);
   }
+
+  if (*caps->hostname)
+cli_msg(-1006, "  Hostname: %s", caps->hostname);
 }
 
 static void
diff --git a/proto/bgp/bgp.h b/proto/bgp/bgp.h
index dd7dc28fd03d..3a99bc9667b5 100644
--- a/proto/bgp/bgp.h
+++ b/proto/bgp/bgp.h
@@ -98,6 +98,7 @@ struct bgp_config {
   int enable_refresh;  /* Enable local support for route 
refresh [RFC 2918] */
   int enable_as4;  /* Enable local support for 4B AS 
numbers [RFC 6793] */
   int enable_extended_messages;/* Enable local support for 
extended messages [draft] */
+  int enable_hostname; /* Enable local support for hostname 
[draft] */
   u32 rr_cluster_id;   /* Route reflector cluster ID, if 
different from local ID */
   int rr_client;   /* Whether neighbor is RR client of me 
*/
   int rs_client;   /* Whether neighbor is RS client of me 
*/
@@ -225,6 +226,7 @@ struct bgp_caps {
   u16 gr_time; /* Graceful restart time in seconds */
 
   u8 llgr_aware;   /* Long-lived GR capability, RFC draft 
*/
+  char hostname[UINT8_MAX + 1];/* Hostname, RFC draft */
   u8 any_ext_next_hop; /* Bitwise OR of per-AF ext_next_hop */
   u8 any_add_path; /* Bitwise OR of per-AF add_path */
 
diff --git a/proto/bgp/config.Y b/proto/bgp/config.Y
index 18c3560dfc9b..f8bd4ef03b11 100644
--- a/proto/bgp/config.Y
+++ b/proto/bgp/config.Y
@@ -62,6 +62,7 @@ bgp_proto_start: proto_start BGP {
  BGP_CFG->error_delay_time_max = 300;
  BGP_CFG->enable_refresh = 1;
  BGP_CFG->enable_as4 = 1;
+ BGP_CFG->enable_hostname = 1;
  BGP_CFG->capabilities = 2;
  BGP_CFG->interpret_communities = 1;
  BGP_CFG->allow_as_sets = 1;
diff --git a/proto/bgp/packets.c b/proto/bgp/packets.c
index 78fdd1e006a3..71ab1be48b46 100644
--- a/proto/bgp/packets.c
+++ b/proto/bgp/packets.c
@@ -11,6 +11,7 @@
 #undef LOCAL_DEBUG
 
 #include 
+#include 
 
 #include "nest/bird.h"
 #include "nest/iface.h"
@@ -221,6 +222,8 @@ bgp_prepare_capabilities(struct bgp_conn *conn)
   struct bgp_channel *c;
   struct bgp_caps *caps;
   struct bgp_af_caps *ac;
+  struct utsname uts = {};
+  size_t length;
 
   if (!p->cf->capabilities)
   {
@@ -252,6 +255,16 @@ bgp_prepare_capabilities(struct bgp_conn *conn)
   if (p->cf->llgr_mode)
 caps->llgr_aware = 1;
 
+  if (p->cf->enabl

Re: [BIRD 2.0.x] Netlink: ignore dead routes

2021-01-15 Thread Vincent Bernat
 ❦ 15 janvier 2021 05:39 +01, Ondrej Zajicek:

>> It is more complex that I would have expected. First, in-kernel, the
>> next-hop only has RTNH_F_LINKDOWN, not RTNH_F_DEAD. This later flag is
>> added when sending the flags over netlink only.
>> 
>> Second, there is no async notification when a route goes down either.
>> There is a notification on the interface. How BIRD handles this case? Is
>> a route scan triggered when an interface goes down? I'll test more
>> later, it's a bit late for me.
>
> Hi
>
> Yes, scan is triggered in krt_if_notify() for iface-admin-down event.
> Perhaps we can also trigger scan for iface-link-down event.

Hello,

You mean this part in krt.c?

  if ((flags & IF_CHANGE_DOWN) && KRT_CF->learn)
krt_scan_timer_kick(p);

I was also confused by the debug code in iface.c:

  if (i->flags & IF_ADMIN_UP)
debug(" LINK-UP");

I think it should be ADMIN-UP and the if for IF_LINK_UP should be added.

I can test such a change in a few days.
-- 
Don't stop at one bug.
- The Elements of Programming Style (Kernighan & Plauger)


Re: [BIRD 2.0.x] Netlink: ignore dead routes

2021-01-14 Thread Vincent Bernat
 ❦ 14 janvier 2021 08:23 +01, Vincent Bernat:

>> Although it would make sense to handle dead routes as withdraws instead
>> of just ingore them (for async notification), it does not matter for sync
>> scan, and as i noticed during testing, Linux kernel does not send async
>> notifications (when the flag changes to dead) anyways, so it does not
>> really matter.
>
> It would makes sense to fix Linux for that. I'll try to send a patch and
> ping here again if it gets accepted.

It is more complex that I would have expected. First, in-kernel, the
next-hop only has RTNH_F_LINKDOWN, not RTNH_F_DEAD. This later flag is
added when sending the flags over netlink only.

Second, there is no async notification when a route goes down either.
There is a notification on the interface. How BIRD handles this case? Is
a route scan triggered when an interface goes down? I'll test more
later, it's a bit late for me.
-- 
He that breaks a thing to find out what it is has left the path of wisdom.
-- J.R.R. Tolkien


Re: [BIRD 2.0.x] Netlink: ignore dead routes

2021-01-13 Thread Vincent Bernat
 ❦ 14 janvier 2021 04:18 +01, Ondrej Zajicek:

> Although it would make sense to handle dead routes as withdraws instead
> of just ingore them (for async notification), it does not matter for sync
> scan, and as i noticed during testing, Linux kernel does not send async
> notifications (when the flag changes to dead) anyways, so it does not
> really matter.

It would makes sense to fix Linux for that. I'll try to send a patch and
ping here again if it gets accepted.
-- 
Make sure comments and code agree.
- The Elements of Programming Style (Kernighan & Plauger)


Re: [BIRD 2.0.x] Netlink: ignore dead routes

2020-10-23 Thread Vincent Bernat
 ❦ 23 octobre 2020 16:17 +02, Bernd Naumann:

> So the issue unfoldes if I get routes via ${hop} and I'm supposed to
> reach ${hop} on an interface (using its device route for example) but 
> the device is down? So bird will happily exports these routes to the
> kernel, but the kernel is never able to send packets on that link 
> because its down. Do I understand this correctly?

My use case is the reverse: you have a static route with nexthop in the
kernel, but the target interface is down. I don't want to BIRD to use
and advertise this route.
-- 
Nothing so needs reforming as other people's habits.
-- Mark Twain


Re: [BIRD 2.0.x] Netlink: ignore dead routes

2020-10-23 Thread Vincent Bernat
 ❦ 23 octobre 2020 08:48 +02, Bernd Naumann:

> I have a question:
> What is then `check link` supposed to do?
>
> At least for 1.6, babel is the only protocol which enables it by
> default, and the others, for in example direct, static, and ospf it is 
> needed to be set by the user, and I would have assumed exactly that
> behavior.

`check link` does not seem to exist for the kernel protocol. It could be
an option, but IMO, this is a separate issue: a route the kernel won't
use shouldn't be used by BIRD either, so the check for the "dead" flag
should be done in all cases.
-- 
Don't stop at one bug.
- The Elements of Programming Style (Kernighan & Plauger)


[BIRD 2.0.x] Netlink: ignore dead routes

2020-10-22 Thread Vincent Bernat
With net.ipv4.conf.XXX.ignore_routes_with_linkdown sysctl, a user can
ensure the kernel does not use a route whose target interface is down.
The route is marked with a "dead"/RTNH_F_DEAD flag. Currently, BIRD
still uses and distributes this route. This patch just ignores such a
route.

This patch could be backported to 1.6.x.

Signed-off-by: Vincent Bernat 
---
 sysdep/linux/netlink.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/sysdep/linux/netlink.c b/sysdep/linux/netlink.c
index f85bcf35685b..c28126510e6e 100644
--- a/sysdep/linux/netlink.c
+++ b/sysdep/linux/netlink.c
@@ -1690,6 +1690,9 @@ nl_parse_route(struct nl_parse_state *s, struct nlmsghdr 
*h)
  if (i->rtm_flags & RTNH_F_ONLINK)
ra->nh.flags |= RNF_ONLINK;
 
+  if (i->rtm_flags & RTNH_F_DEAD)
+return;
+
  neighbor *nbr;
  nbr = neigh_find(>p, ra->nh.gw, ra->nh.iface,
   (ra->nh.flags & RNF_ONLINK) ? NEF_ONLINK : 0);
-- 
2.28.0



[BIRD 1.6.x] Unix: fix compilation with GCC 10

2020-09-28 Thread Vincent Bernat
GCC 10 will now error when declaring a global variable twice:
 https://gcc.gnu.org/gcc-10/porting_to.html#common

Fix this issue by declaring the variable as `extern' in `krt.h'. The
variable is really declared in `krt.c'.

LD -r -o all.o cf-parse.tab.o cf-lex.o conf.o
/usr/bin/ld: 
cf-lex.o:/home/bernat/code/exoscale/bird/build~/conf/../lib/krt.h:115: multiple 
definition of `kif_proto'; 
cf-parse.tab.o:/home/bernat/code/exoscale/bird/build~/conf/../lib/krt.h:115: 
first defined here

This issue was already fixed in BIRD 2.x.
---
 sysdep/unix/krt.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sysdep/unix/krt.h b/sysdep/unix/krt.h
index d4a8717e38ec..fe79efc3717d 100644
--- a/sysdep/unix/krt.h
+++ b/sysdep/unix/krt.h
@@ -112,7 +112,7 @@ struct kif_proto {
   struct kif_state sys;/* Sysdep state */
 };
 
-struct kif_proto *kif_proto;
+extern struct kif_proto *kif_proto;
 
 #define KIF_CF ((struct kif_config *)p->p.cf)
 
-- 
2.28.0



Re: IPv6 BGP & kernel 4.19

2019-12-03 Thread Vincent Bernat
 ❦  3 décembre 2019 12:48 +01, Alarig Le Lay :

>> It's not unexpected. A cache entry is for a /128.
>
> When I’m routing 80k prefixes I don’t want to have n /128 routes because
> someone doesn’t have 1500 of MTU. Is their a way to disable this
> behaviour?

I don't think there is. The information needs to be stored somewhere.
With IPv6, they are materialized as regular route entries tagged as
"cached routes". With IPv4, they are stored inside a route entry.
-- 
Don't stop with your first draft.
- The Elements of Programming Style (Kernighan & Plauger)


Re: IPv6 BGP & kernel 4.19

2019-12-03 Thread Vincent Bernat
 ❦  3 décembre 2019 11:46 +01, Alarig Le Lay :

> So, I have more routes in cache than in FIB on my two core routers, I’m
> pretty sure there is a bug there :p

It's not unexpected. A cache entry is for a /128.

> I have less routes in cache on 4.14 kernels but more traffic.
>
> I don’t know which function is feeding the cache, but I think that it’s
> doing too much.

The function is ip6_rt_cache_alloc(). It is being called on PMTU
exceptions, on redirects and in this last case I currently fail to
understand:

> ipv6: Create RTF_CACHE clone when FLOWI_FLAG_KNOWN_NH is set
> 
> This patch always creates RTF_CACHE clone with DST_NOCACHE
> when FLOWI_FLAG_KNOWN_NH is set so that the rt6i_dst is set to
> the fl6->daddr.


-- 
It is a wise father that knows his own child.
-- William Shakespeare, "The Merchant of Venice"


Re: IPv6 BGP & kernel 4.19

2019-12-03 Thread Vincent Bernat
 ❦  3 décembre 2019 08:56 +01, Alarig Le Lay :

>> Just to be clear: I did forget this fact and therefore my initial
>> recommendation to increase max_size with more than 4096 active hosts
>> does not apply anymore (as long as you have a 4.2+ kernel). Keep the
>> default value and watch `/proc/net/rt6_stats`.
>
> core01-arendal ~ # cat /proc/net/rt6_stats
> 0048 002c 5e56 0050  0056 0020
>
> It is supposed to be understandable? :D

So, there is 0x56 entries in the cache. Isn't that clear? :)

https://elixir.bootlin.com/linux/latest/source/net/ipv6/route.c#L6006

-- 
Modularise.  Use subroutines.
- The Elements of Programming Style (Kernighan & Plauger)


Re: IPv6 BGP & kernel 4.19

2019-12-02 Thread Vincent Bernat
 ❦  2 décembre 2019 22:48 +01, Vincent Bernat :

> Also, from 4.2, the cache entries are only created for exceptions (PMTU
> notably). So, in fact, the initial value should be mostly safe. You can
> monitor it with `/proc/net/rt6_stats`. This is the before last value. If
> you can share what you have, I would be curious to know how low it is
> (compared to the 4th entry notably).

Just to be clear: I did forget this fact and therefore my initial
recommendation to increase max_size with more than 4096 active hosts
does not apply anymore (as long as you have a 4.2+ kernel). Keep the
default value and watch `/proc/net/rt6_stats`.
-- 
Program defensively.
- The Elements of Programming Style (Kernighan & Plauger)


Re: IPv6 BGP & kernel 4.19

2019-12-02 Thread Vincent Bernat
 ❦  2 décembre 2019 21:58 +01, Alarig Le Lay :

>> For IPv6, this is the size of the routing cache. If you have more than
>> 4096 active hosts, Linux will aggressively try to run garbage
>> collection, eating CPU. In this case, increase both
>> net.ipv6.route.max_size and net.ipv6.route.gc_thresh.
>
> Do you know what are the risks when we raise those parameters? A bit
> more RAM consumption?

You are mostly safe with RAM. Increasing the value to 512k would eat
256MB of RAM. However, if an attacker is still able to overflow the
cache, it is costly in term of CPU. This is a bit similar to the route
cache for IPv4, so you need to play with threshold, interval and timeout
to try to keep CPU usage down, but ultimately, a fast enough attacker
can do a lot of damage here. I don't have real-life experience with this
aspect.

Also, from 4.2, the cache entries are only created for exceptions (PMTU
notably). So, in fact, the initial value should be mostly safe. You can
monitor it with `/proc/net/rt6_stats`. This is the before last value. If
you can share what you have, I would be curious to know how low it is
(compared to the 4th entry notably).
-- 
Writing is turning one's worst moments into money.
-- J.P. Donleavy


Re: IPv6 BGP & kernel 4.19

2019-12-02 Thread Vincent Bernat
 ❦  1 décembre 2019 19:20 +01, Clément Guivy :

> Hi, that's good news. One thing that still confuses me though is that
> the default values for these settings are the same in Debian 9 (4.9
> kernel) and Debian 10 (4.19 kernel), so I would expect the behaviour
> to be the same between both versions in that regard.
> Also I'm not sure to understand what this max_size parameter actually
> does since I have it to default value (4096), and yet ipv6 route table
> at the moment is >70k entries large without the kernel complaining.

For IPv4, the parameter is ignored since Linux 3.6. For IPv6, this is
the size of the routing cache. If you have more than 4096 active hosts,
Linux will aggressively try to run garbage collection, eating CPU. In
this case, increase both net.ipv6.route.max_size and
net.ipv6.route.gc_thresh. That's a pity, but this value is not easily
observable, so it's hard to know when you hit it. Also, while IPv4
recently got the ability back to enumerate the cache, this is not the
case for IPv6.

This setting is a bit confusing as it is not documented and in the past,
it was limiting the whole IPv6 route table (before Linux 3.0).
-- 
Write clearly - don't sacrifice clarity for "efficiency".
- The Elements of Programming Style (Kernighan & Plauger)


Re: Is BIRD on BSD a second class citizen?

2019-10-31 Thread Vincent Bernat
 ❦ 31 octobre 2019 08:54 +00, k simon :

> FreeBSD does not support MPLS and VRF, but it support ECMP by
> recompile kernel with “options RADIX_MPATH”, and quagga/frr have
> supported it for few years .

It seems this option is broken since FreeBSD 11. See

-- 
They have been at a great feast of languages, and stolen the scraps.
-- William Shakespeare, "Love's Labour's Lost"


Re: BGP graceful restart for software update only?

2019-10-17 Thread Vincent Bernat
 ❦ 17 octobre 2019 11:29 +01, Neil Jerram :

> In my setup, an instance of BIRD runs all the time, except for when it
> needs to be restarted for a software update.
>
> For that update scenario, I'd like BGP graceful restart to apply, so that
> the stop-update-restart process does not cause the routes advertised by
> this BIRD to be withdrawn from the rest of the BGP network.
>
> For all other scenarios, however, I don't want any graceful restart.
> Specifically, if there's a break in connectivity to a BGP peer, I want to
> detect that as quickly as possible (with BFD), locally to remove the routes
> learned from that peer, and for that peer to remove routes learned from me,
> all immediately.
>
> Is there some combination of configuration and procedure that can provide
> both of those desires?

You should look at the long lived graceful restart alternative. It will
enable you to do software upgrades without impact without keeping routes
around when a BGP session is cut unexpectedly, as long as you have
alternative routes available.
-- 
Terminate input by end-of-file or marker, not by count.
- The Elements of Programming Style (Kernighan & Plauger)


[PATCH 3/3] BSD: add support for ttl security and IPv6

2019-08-12 Thread Vincent Bernat
FreeBSD use the same value as IPv4, set with IP_MINTTL, for IPv6. See:

---
 sysdep/bsd/sysio.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/sysdep/bsd/sysio.h b/sysdep/bsd/sysio.h
index 545276a37c3a..f02b80945ba2 100644
--- a/sysdep/bsd/sysio.h
+++ b/sysdep/bsd/sysio.h
@@ -245,9 +245,9 @@ sk_set_min_ttl4(sock *s, int ttl)
 }
 
 static inline int
-sk_set_min_ttl6(sock *s, int ttl UNUSED)
+sk_set_min_ttl6(sock *s, int ttl)
 {
-  ERR_MSG("Kernel does not support IPv6 TTL security");
+  return sk_set_min_ttl4(s, ttl);
 }
 
 static inline int
-- 
2.23.0.rc1



[PATCH 1/3] Doc: git clone the legacy branch for BIRD 1.6.x

2019-08-12 Thread Vincent Bernat
---
 INSTALL | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/INSTALL b/INSTALL
index b3f66f135122..e48a53f7e8c6 100644
--- a/INSTALL
+++ b/INSTALL
@@ -15,7 +15,7 @@ To compile current development BIRD source code from Git 
repository, you
 also need Git (to download the source code) and Autoconf (to generate
 the configure script and associated files using 'autoreconf' tool):
 
-$ git clone https://gitlab.labs.nic.cz/labs/bird/
+$ git clone https://gitlab.labs.nic.cz/labs/bird/ -b legacy
 $ cd bird
 $ autoreconf
 
-- 
2.23.0.rc1



Some patches for FreeBSD

2019-08-12 Thread Vincent Bernat
Hey!

Here are a few patches for FreeBSD. They are targeted to the "legacy"
branch, but the two last ones can be applied to the master branch as
well.



[PATCH 2/3] BSD: don't complain about interfaces without IP addresses

2019-08-12 Thread Vincent Bernat
Without this patch, BIRD complains about interfaces without an IP
address:

2019-08-12 07:41:26  KIF: Invalid interface address 0.0.0.0 for vtnet2
---
 sysdep/bsd/krt-sock.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/sysdep/bsd/krt-sock.c b/sysdep/bsd/krt-sock.c
index f0cebd11c5b7..f92cf1a68dc6 100644
--- a/sysdep/bsd/krt-sock.c
+++ b/sysdep/bsd/krt-sock.c
@@ -651,6 +651,8 @@ krt_read_addr(struct ks_msg *msg, int scan)
   imask = ipa_from_sa();
   ibrd  = ipa_from_sa();
 
+  if (ipa_zero(iaddr))
+return;
 
   if ((masklen = ipa_masklen(imask)) < 0)
   {
-- 
2.23.0.rc1



Re: BIRD 2.0.5 and 1.6.7

2019-08-05 Thread Vincent Bernat
 ❦  5 août 2019 18:28 +02, Ondrej Zajicek :

>> You mean I can do:
>> 
>> protocol bgp XXX {
>>  neighbor;
>>  interface eth0;
>>  /* ... */
>> }
>> 
>> ?
>
> No, you need to specify neighbor link-local address
> (i probably misunderstood your point):
>
> protocol bgp XXX {
>   neighbor fe80::1 external;
>   interface eth0;
> }

Oh, OK, not as convenient as specifying an interface. I wonder how FRR
is getting the remote IP. Maybe it's automatically in the neighbor table
due to neighbor advertisements?

> It is a bit ugly for a PtP link, It is true that a simple BGP protocol
> that accept any peer IP from that interface could also makes sense.
> I did not notice that.

If both ends could be configured the same way, it would be great. But,
then, at least one of them would need to not be passive.
-- 
Use recursive procedures for recursively-defined data structures.
- The Elements of Programming Style (Kernighan & Plauger)


Re: BIRD 2.0.5 and 1.6.7

2019-08-05 Thread Vincent Bernat
 ❦  5 août 2019 17:24 +02, Ondrej Zajicek :

>> >   o BGP: Dynamic BGP
>> >   o BGP: Promiscuous ASN mode
>> 
>> That's great! Is there a roadmap for additional features around that?
>
> In future, I would like to implement automatic BGP neighbor discovery,
> like draft-xu-idr-neighbor-autodiscovery-10.
>
>> Notably:
>> 
>>  - establish a BGP session using an interface name and the associated
>>link-local IPv6 address,
>
> This is already supported since long time.

You mean I can do:

protocol bgp XXX {
 neighbor;
 interface eth0;
 /* ... */
}

?

>>  - implement RFC5549 (IPv4 NLRI with an IPv6 next-hop)
>
> This is supported since 2.0.0 in BGP, but there is still no support
> in Linux kernel (AFAIK) and in Kernel protocol.

Cumulus chose to implement it without support in the kernel by using
link-local IPv4 addresses and static ARP entries. I don't know how
standard and interoperable this is.
-- 
Choose a data representation that makes the program simple.
- The Elements of Programming Style (Kernighan & Plauger)


Re: BIRD 2.0.5 and 1.6.7

2019-08-05 Thread Vincent Bernat
 ❦  5 août 2019 09:54 +02, Ondrej Filip :

>   o BGP: Dynamic BGP
>   o BGP: Promiscuous ASN mode

That's great! Is there a roadmap for additional features around that?
Notably:

 - establish a BGP session using an interface name and the associated
   link-local IPv6 address,
 
 - implement RFC5549 (IPv4 NLRI with an IPv6 next-hop)

Thanks!
-- 
Make input easy to proofread.
- The Elements of Programming Style (Kernighan & Plauger)


[PATCH] RPKI: fix allocation of hostname when using an IPv6 address

2019-07-29 Thread Vincent Bernat
---
 proto/rpki/config.Y | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/proto/rpki/config.Y b/proto/rpki/config.Y
index a88a29a1c004..63c7105cd073 100644
--- a/proto/rpki/config.Y
+++ b/proto/rpki/config.Y
@@ -97,7 +97,7 @@ rpki_cache_addr:
  rpki_check_unused_hostname();
  RPKI_CFG->ip = $1;
  /* Ensure hostname is filled */
- char *hostname = cfg_allocz(sizeof(INET6_ADDRSTRLEN + 1));
+ char *hostname = cfg_allocz(INET6_ADDRSTRLEN + 1);
  bsnprintf(hostname, INET6_ADDRSTRLEN+1, "%I", RPKI_CFG->ip);
  RPKI_CFG->hostname = hostname;
}
-- 
2.22.0



Re: Crash when filtering routes in BGP protocol

2019-07-06 Thread Vincent Bernat
Hey!

I am unsure if my message was successfully delivered to the appropriate
people (maybe it was filtered due to DKIM).
-- 
Follow each decision as closely as possible with its associated action.
- The Elements of Programming Style (Kernighan & Plauger)

 ――― Original Message ―――
 From: Vincent Bernat 
 Sent: 25 juin 2019 19:57 +02
 Subject: Crash when filtering routes in BGP protocol
 To: bird-users

> Hey!
>
> When filtering routes in BGP, I get the following crash with BIRD master:
>
> #v+
> Program received signal SIGSEGV, Segmentation fault.
> 0x5558ccdd in rta_free (r=0x5558adc0 ) at 
> ../nest/route.h:643
> 643 static inline void rta_free(rta *r) { if (r && !--r->uc) 
> rta__free(r); }
> gdb$  bt full
> #0  0x5558ccdd in rta_free (r=0x5558adc0 ) at 
> ../nest/route.h:643
> No locals.
> #1  rte_update2 (c=0x555f3de0, n=0x7fffe2f0, n@entry=0x7fffe260, 
> new=, src=0x555fec00) at ../nest/rt-table.c:1589
> old_attrs = 0x5558adc0 
> fr = 
> p = 
> stats = 0x555f3e78
> filter = 0x555ed980
> dummy = 0x0
> nn = 0x7fffe210
> #2  0x5559ca0a in rte_update3 (src=, new= out>, n=, c=) at ../nest/protocol.h:628
> No locals.
> #3  bgp_rte_update (s=s@entry=0x7fffe350, n=n@entry=0x7fffe2f0, 
> path_id=path_id@entry=4294959812, a0=a0@entry=0x0) at 
> ../proto/bgp/packets.c:1267
> a = 
> e = 
> #4  0x5559d6dd in bgp_decode_nlri_ip6 (s=0x7fffe350, 
> pos=, len=, a=0x0) at 
> ../proto/bgp/packets.c:1500
> net = {type = 2 '\002', pxlen = 48 '0', length = 20, prefix = {addr = 
> {536939960, 3722248192, 0, 0}}}
> path_id = 4294959812
> l = 48
> addr = {addr = {308786, 56797, 0, 0}}
> b = 
> #5  0x5559aced in bgp_decode_nlri (s=s@entry=0x7fffe460, 
> afi=, nlri=0x556034d0 "0 \001\r\270\335\335\060 
> \001\r\270\314\314@\001\001", len=14, ea=ea@entry=0x556065f0, 
> nh=, nh_len=32) at ../proto/bgp/packets.c:2351
> c = 0x555f3de0
> a = 0x7fffe350
> #6  0x5559ed64 in bgp_rx_update (conn=conn@entry=0x555f3cd8, 
> pkt=pkt@entry=0x55603490 '\377' , len=91) at 
> ../proto/bgp/packets.c:2448
> p = 
> ea = 0x556065f0
> s = {proto = 0x555f3ad0, channel = 0x555f3de0, pool = 
> 0x556019c0, as4_session = 1, add_path = 0, mpls = 0, attrs_seen = {16390, 
> 0, 0, 0, 0, 0, 0, 0}, mp_reach_af = 131073, mp_unreach_af = 0, attr_len = 68, 
> ip_reach_len = 0, ip_unreach_len = 0, ip_next_hop_len = 0, mp_reach_len = 14, 
> mp_unreach_len = 0, mp_next_hop_len = 32, attrs = 0x556034a7 "\220\016", 
> ip_reach_nlri = 0x556034eb '\377' , ip_unreach_nlri = 
> 0x556034a5 "", ip_next_hop_data = 0x0, mp_reach_nlri = 0x556034d0 "0 
> \001\r\270\335\335\060 \001\r\270\314\314@\001\001", mp_unreach_nlri = 0x0, 
> mp_next_hop_data = 0x556034af " \001\r\270\252\252", err_withdraw = 0, 
> err_subcode = 0, err_jmpbuf = {{__jmpbuf = {93824992885456, 
> -942560477419964727, 93824992949408, 93824992885976, 0, 93824992949392, 
> 942560477161682633, 6359628643728717513}, __mask_was_saved = 0, __saved_mask 
> = {__val = {0 , hostentry = 0x0, mpls_labels = 0x0, 
> last_id = 0, last_s!
 rc!
>   = 0x555fec00, cached_rta = 0x556075c8}
> pos = 
> #7  0x5559fadb in bgp_rx_packet (len=, 
> pkt=0x55603490 '\377' , conn=0x555f3cd8) at 
> ../proto/bgp/packets.c:3024
> type = 2 '\002'
> type = 
> #8  bgp_rx (sk=0x55601bb0, size=) at 
> ../proto/bgp/packets.c:3069
> conn = 0x555f3cd8
> pkt_start = 0x55603490 '\377' 
> end = 0x55603508 ""
> i = 
> len = 
> #9  0x555a48da in call_rx_hook (s=0x55601bb0, size= out>) at ../sysdep/unix/io.c:1794
> No locals.
> #10 0x555a6db7 in sk_read (s=s@entry=0x55601bb0, revents=1) at 
> ../sysdep/unix/io.c:1882
> c = 
> #11 0x555a781e in io_loop () at ../sysdep/unix/io.c:2344
> s = 
> count = 1
> poll_tout = 
> timeout = 
> nfds = 
> events = 
> pout = 
> t = 
> s = 
> n = 
> fdmax = 256
> pfd = 0x55601010
> #12 0x55560f53 in main (argc=, argv=) 
> at ../sysdep/unix/main.c:906
> use_uid = 
> use_gid = 
> conf = 0x555eca10
> #v-
>
>
> Minimal configuration is:
>
> #v+
> log "/var/log/bird.log" all;
> router id 2.2.2.2;
>
> filter validated {
>reject;
> }
>
> protocol device {
> }
>
> protocol bgp {
>local as 65001;
>neighbor 2001:db8:::0 as 65000;
>ipv6 {
>   import filter validated;
>   export none;
>};
> }
> #v-
>
> I have tried to fix that by initializing `old_attrs` to NULL, but this
> leads to crash elsewhere. Since I don't know what a temporary attribute
> is, I may miss the whole picture.



[PATCH] Doc: fix typo in BGP dynamic names feature description

2019-07-06 Thread Vincent Bernat
---
 doc/bird.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/bird.sgml b/doc/bird.sgml
index edc871fbe7e4..715bf1cf3937 100644
--- a/doc/bird.sgml
+++ b/doc/bird.sgml
@@ -2246,7 +2246,7 @@ using the following configuration parameters:
 
dynamic name "
Define common prefix of names used for new BGP instances spawned when
-   dynamic BGP behavior is active. Actual names also contain numberic
+   dynamic BGP behavior is active. Actual names also contain numeric
index to distinguish individual instances.  Default: "dynbgp".
 
dynamic name digits 
-- 
2.20.1



Re: Crash when filtering routes in BGP protocol

2019-06-25 Thread Vincent Bernat
 ❦ 25 juin 2019 19:57 +02, Vincent Bernat :

> When filtering routes in BGP, I get the following crash with BIRD
> master:

Bissected to:

ommit 875cc073b067f295668008e10218f8e98dd3
Author: Ondrej Zajicek (work) 
Date:   Thu Mar 14 17:22:22 2019 +0100

Nest: Update handling of temporary attributes
-- 
It is often the case that the man who can't tell a lie thinks he is the best
judge of one.
-- Mark Twain, "Pudd'nhead Wilson's Calendar"



Crash when filtering routes in BGP protocol

2019-06-25 Thread Vincent Bernat
Hey!

When filtering routes in BGP, I get the following crash with BIRD master:

#v+
Program received signal SIGSEGV, Segmentation fault.
0x5558ccdd in rta_free (r=0x5558adc0 ) at 
../nest/route.h:643
643 static inline void rta_free(rta *r) { if (r && !--r->uc) rta__free(r); }
gdb$  bt full
#0  0x5558ccdd in rta_free (r=0x5558adc0 ) at 
../nest/route.h:643
No locals.
#1  rte_update2 (c=0x555f3de0, n=0x7fffe2f0, n@entry=0x7fffe260, 
new=, src=0x555fec00) at ../nest/rt-table.c:1589
old_attrs = 0x5558adc0 
fr = 
p = 
stats = 0x555f3e78
filter = 0x555ed980
dummy = 0x0
nn = 0x7fffe210
#2  0x5559ca0a in rte_update3 (src=, new=, n=, c=) at ../nest/protocol.h:628
No locals.
#3  bgp_rte_update (s=s@entry=0x7fffe350, n=n@entry=0x7fffe2f0, 
path_id=path_id@entry=4294959812, a0=a0@entry=0x0) at 
../proto/bgp/packets.c:1267
a = 
e = 
#4  0x5559d6dd in bgp_decode_nlri_ip6 (s=0x7fffe350, pos=, len=, a=0x0) at ../proto/bgp/packets.c:1500
net = {type = 2 '\002', pxlen = 48 '0', length = 20, prefix = {addr = 
{536939960, 3722248192, 0, 0}}}
path_id = 4294959812
l = 48
addr = {addr = {308786, 56797, 0, 0}}
b = 
#5  0x5559aced in bgp_decode_nlri (s=s@entry=0x7fffe460, 
afi=, nlri=0x556034d0 "0 \001\r\270\335\335\060 
\001\r\270\314\314@\001\001", len=14, ea=ea@entry=0x556065f0, nh=, nh_len=32) at ../proto/bgp/packets.c:2351
c = 0x555f3de0
a = 0x7fffe350
#6  0x5559ed64 in bgp_rx_update (conn=conn@entry=0x555f3cd8, 
pkt=pkt@entry=0x55603490 '\377' , len=91) at 
../proto/bgp/packets.c:2448
p = 
ea = 0x556065f0
s = {proto = 0x555f3ad0, channel = 0x555f3de0, pool = 
0x556019c0, as4_session = 1, add_path = 0, mpls = 0, attrs_seen = {16390, 
0, 0, 0, 0, 0, 0, 0}, mp_reach_af = 131073, mp_unreach_af = 0, attr_len = 68, 
ip_reach_len = 0, ip_unreach_len = 0, ip_next_hop_len = 0, mp_reach_len = 14, 
mp_unreach_len = 0, mp_next_hop_len = 32, attrs = 0x556034a7 "\220\016", 
ip_reach_nlri = 0x556034eb '\377' , ip_unreach_nlri = 
0x556034a5 "", ip_next_hop_data = 0x0, mp_reach_nlri = 0x556034d0 "0 
\001\r\270\335\335\060 \001\r\270\314\314@\001\001", mp_unreach_nlri = 0x0, 
mp_next_hop_data = 0x556034af " \001\r\270\252\252", err_withdraw = 0, 
err_subcode = 0, err_jmpbuf = {{__jmpbuf = {93824992885456, 
-942560477419964727, 93824992949408, 93824992885976, 0, 93824992949392, 
942560477161682633, 6359628643728717513}, __mask_was_saved = 0, __saved_mask = 
{__val = {0 , hostentry = 0x0, mpls_labels = 0x0, last_id 
= 0, last_src!
  = 0x555fec00, cached_rta = 0x556075c8}
pos = 
#7  0x5559fadb in bgp_rx_packet (len=, 
pkt=0x55603490 '\377' , conn=0x555f3cd8) at 
../proto/bgp/packets.c:3024
type = 2 '\002'
type = 
#8  bgp_rx (sk=0x55601bb0, size=) at 
../proto/bgp/packets.c:3069
conn = 0x555f3cd8
pkt_start = 0x55603490 '\377' 
end = 0x55603508 ""
i = 
len = 
#9  0x555a48da in call_rx_hook (s=0x55601bb0, size=) 
at ../sysdep/unix/io.c:1794
No locals.
#10 0x555a6db7 in sk_read (s=s@entry=0x55601bb0, revents=1) at 
../sysdep/unix/io.c:1882
c = 
#11 0x555a781e in io_loop () at ../sysdep/unix/io.c:2344
s = 
count = 1
poll_tout = 
timeout = 
nfds = 
events = 
pout = 
t = 
s = 
n = 
fdmax = 256
pfd = 0x55601010
#12 0x55560f53 in main (argc=, argv=) at 
../sysdep/unix/main.c:906
use_uid = 
use_gid = 
conf = 0x555eca10
#v-

Minimal configuration is:

#v+
log "/var/log/bird.log" all;
router id 2.2.2.2;

filter validated {
   reject;
}

protocol device {
}

protocol bgp {
   local as 65001;
   neighbor 2001:db8:::0 as 65000;
   ipv6 {
  import filter validated;
  export none;
   };
}
#v-

I have tried to fix that by initializing `old_attrs` to NULL, but this
leads to crash elsewhere. Since I don't know what a temporary attribute
is, I may miss the whole picture.
-- 
Don't diddle code to make it faster - find a better algorithm.
- The Elements of Programming Style (Kernighan & Plauger)



[PATCH] Doc: fix typo in BGP dynamic names feature description

2019-06-25 Thread Vincent Bernat
---
 doc/bird.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/bird.sgml b/doc/bird.sgml
index edc871fbe7e4..715bf1cf3937 100644
--- a/doc/bird.sgml
+++ b/doc/bird.sgml
@@ -2246,7 +2246,7 @@ using the following configuration parameters:
 
dynamic name "
Define common prefix of names used for new BGP instances spawned when
-   dynamic BGP behavior is active. Actual names also contain numberic
+   dynamic BGP behavior is active. Actual names also contain numeric
index to distinguish individual instances.  Default: "dynbgp".
 
dynamic name digits 
-- 
2.20.1



Re: Debian packages for 1.6.5 and 2.0.3

2019-02-06 Thread Vincent Bernat
 ❦  6 février 2019 16:47 +01, Ondrej Zajicek :

> I cannot give precise ETA for our/official packages, as it is a bit
> organizationally complex, depends on multiple people, and we are often
> interrupted by higher priority tasks like bugfixes.

If you are interested and Ondřej Surý doesn't mind, I can maintain
backports for bird and bird2 on a debian.net domain for Debian and on
Launchpad for Ubuntu, using the package sources currently on Salsa. I am
already doing such a thing for HAProxy (see haproxy.debian.net) and it
doesn't take much time.
-- 
Writing is easy; all you do is sit staring at the blank sheet of paper until
drops of blood form on your forehead.
-- Gene Fowler



[PATCH] Unix IO: Set socket priority after setting family specific options

2018-01-22 Thread Vincent Bernat
From: Vincent Bernat <vinc...@bernat.im>

On Linux, setting the ToS will also set the priority and the range of
accepted values is quite limited (masked by 0x1e). Therefore, 0xc0 is
translated to a priority of 0, not something we want, overriding the
"7" priority which was set previously explicitely. To avoid that, just
move setting priority later in the code.
---
 sysdep/unix/io.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/sysdep/unix/io.c b/sysdep/unix/io.c
index 8773f4c41e86..1c81acbf4518 100644
--- a/sysdep/unix/io.c
+++ b/sysdep/unix/io.c
@@ -1226,10 +1226,6 @@ sk_setup(sock *s)
 #endif
   }
 
-  if (s->priority >= 0)
-if (sk_set_priority(s, s->priority) < 0)
-  return -1;
-
   if (sk_is_ipv4(s))
   {
 if (s->flags & SKF_LADDR_RX)
@@ -1280,6 +1276,10 @@ sk_setup(sock *s)
return -1;
   }
 
+  if (s->priority >= 0)
+if (sk_set_priority(s, s->priority) < 0)
+  return -1;
+
   return 0;
 }
 
-- 
2.15.1



Re: birdc 2.0.0 crashes

2017-12-14 Thread Vincent Bernat
 ❦ 14 décembre 2017 16:48 +0100, Clemens Schrimpe  :

> Nope - it is 2:
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x779b6a0b in previous_history () at 
> /build/readline6-RKA9OI/readline6-6.3/history.c:185
> 185   /build/readline6-RKA9OI/readline6-6.3/history.c: No such file or 
> directory.
> (gdb) p history_offset
> $1 = 2
> (gdb) p *(the_history[2])
> Cannot access memory at address 0x10
>
> We are up to something! :-)
>
>   -c

You could start birdc from gdb:

gbd --args birdc

Then, add a watchpoint on history_offset:

watch history_offset

Then "run". Each time gdb stops because history_offset changed, grab a
"bt". Maybe it's not initialized to 0 properly? It should be increased
by add_history calls.

I don't see anything suspicious in bird about all that (it's all handled
by libreadline itself).
-- 
Make input easy to proofread.
- The Elements of Programming Style (Kernighan & Plauger)



Re: birdc 2.0.0 crashes

2017-12-14 Thread Vincent Bernat
 ❦ 14 décembre 2017 13:13 +0100, Clemens Schrimpe  :

> (gdb) p the_history
> $1 = (HIST_ENTRY **) 0x609470
> (gdb) p the_history[0]
> $2 = (HIST_ENTRY *) 0x609630
> (gdb) p *(the_history[0])
> $3 = {line = 0x609650 „show protocols „, timestamp = 0x609610 "", data = 0x0}
>
>
> Double-plus-weird!

And "p history_offset"? Should be 0.
-- 
Replace repetitive expressions by calls to a common function.
- The Elements of Programming Style (Kernighan & Plauger)



Re: birdc 2.0.0 crashes

2017-12-14 Thread Vincent Bernat
 ❦ 13 décembre 2017 22:56 +0100, Clemens Schrimpe  :

> bird> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x779b6a0b in previous_history () at 
> /build/readline6-RKA9OI/readline6-6.3/history.c:185
> 185   /build/readline6-RKA9OI/readline6-6.3/history.c: No such file or 
> directory.
> (gdb) bt
> #0  0x779b6a0b in previous_history () at 
> /build/readline6-RKA9OI/readline6-6.3/history.c:185
> #1  0x779b65e5 in rl_get_previous_history (count=, 
> key=)
> at /build/readline6-RKA9OI/readline6-6.3/misc.c:609
> #2  0x7799b990 in _rl_dispatch_subseq (key=65, map=, 
> got_subseq=0)
> at /build/readline6-RKA9OI/readline6-6.3/readline.c:832
> #3  0x7799c202 in _rl_dispatch_callback (cxt=0x61cb10) at 
> /build/readline6-RKA9OI/readline6-6.3/readline.c:736
> #4  0x779b27ff in rl_callback_read_char () at 
> /build/readline6-RKA9OI/readline6-6.3/callback.c:188
> #5  0x00403885 in input_read () at client/birdc.c:219
> #6  0x004020d3 in select_loop () at client/client.c:375
> #7  main (argc=, argv=) at client/client.c:447
>
> Maybe this provides a hint?!

Use "bt full".

What's "p the_history" and "p the_history[0]"?
-- 
Having nothing, nothing can he lose.
-- William Shakespeare, "Henry VI"



Re: Issues establishing more than 2 BGP sessions

2017-11-12 Thread Vincent Bernat
 ❦ 11 novembre 2017 23:44 -0600, Chris Stein  :

> Individually, bird is able to establish a session on both tunnels at every
> remote VPC, so I know that works. Occasionally, I have noticed that
> established connections will disconnect with a “Hold timer expired”.
> There’s something I’m missing/overlooking in the config to allow all
> sessions to be active.

I think BIRD is receiving a remote route that would replace the route
used to reach the neighbor. Are you using route-based tunnels (with VTI
interfaces)? If yes, "ip route show" output would help to
understand. Otherwise, "ip xfrm policy" would help.

If you want a working setup similar to yours (a tad more complex since
it involves multiple routing tables), here is one:

 https://vincent.bernat.im/en/blog/2017-route-based-vpn
-- 
Use self-identifying input.  Allow defaults.  Echo both on output.
- The Elements of Programming Style (Kernighan & Plauger)



Re: Bird for RTBH trigger

2017-10-03 Thread Vincent Bernat
 ❦  3 octobre 2017 20:10 -0400, Robert Blayzor  :

> protocol static trig1 {
> route 192.0.2.0/24 blackhole;
> route 192.168.255.254/32 via 192.0.2.1;

Why not just "route 192.168.255.254/32 blackhole"?
-- 
The ripest fruit falls first.
-- William Shakespeare, "Richard II"



Re: Kernel dropped some netlink messages, will resync on next scan.

2017-09-13 Thread Vincent Bernat
 ❦ 13 septembre 2017 12:34 +0200, "Giuseppe Ravasio (LU)" 
 :

> I did some fishing with the ip monitor route and the message gets
> printed only when more than about 500/600 routes gets changed, and so
> 600 are added and >600 are deleted from the routing table.
>
>
> Do you think that there is something that can be tuned to avoid this
> message dropping by the kernel?

Increasing net.core.rmem_default should do the trick (before starting
BIRD). Otherwise, this is not a real worry: internally, unless you have
another process pushing routes, BIRD already has a correct view of the
routing table.
-- 
By trying we can easily learn to endure adversity.  Another man's, I mean.
-- Mark Twain



Re: Kernel dropped some netlink messages, will resync on next scan.

2017-09-12 Thread Vincent Bernat
 ❦ 12 septembre 2017 11:19 +0200, "Giuseppe Ravasio (LU)" 
 :

> For IPv4 daemon this seems to happen *a lot* when I start the BGP
> session with the iBGP enabled (and also this I think could be
> acceptable), but the message gets printed sometimes (a single line every
> 5/10 minutes) when the BGP is running normally.

In a terminal, also run "ip monitor route" and check if there is
anything fishy in it.
-- 
Perilous to all of us are the devices of an art deeper than we ourselves
possess.
-- Gandalf the Grey [J.R.R. Tolkien, "Lord of the
Rings"]



Re: Fix IPv6 ECMP handling with 4.11+ kernel

2017-09-01 Thread Vincent Bernat
 ❦  1 septembre 2017 13:12 +0200, Ondrej Zajicek  :

>> > Also, alien routes are correctly parsed when next-hops are correctly
>> > ordered (I didn't check if this restriction is also present for IPv4 or
>> > if Linux is always sending IPv4 multipath routes with next-hops
>> > correctly ordered)
>> 
>> It's the same for IPv4. Dunno if it should be considered as a bug?
>
> Not sure what is a problem here. Do you mean that alien routes are not
> correctly parsed when next-hops are not ordered? How does the problem
> manifests?

Yes.

2017-09-01T13:19:20.741524+02:00 V1-1 bird6: Ignoring unsorted multipath route 
2001:db8:a5::/64 received via kernel1

The route:

2001:db8:a5::/64 metric 1024
nexthop via 2001:db8:ff::3  dev vti4 weight 1
nexthop via 2001:db8:ff::1  dev vti3 weight 1

Same for IPv4.
-- 
Identify bad input; recover if possible.
- The Elements of Programming Style (Kernighan & Plauger)



Re: Fix IPv6 ECMP handling with 4.11+ kernel

2017-09-01 Thread Vincent Bernat
 ❦ 31 août 2017 22:39 +0200, Vincent Bernat <ber...@luffy.cx> :

> Also, alien routes are correctly parsed when next-hops are correctly
> ordered (I didn't check if this restriction is also present for IPv4 or
> if Linux is always sending IPv4 multipath routes with next-hops
> correctly ordered)

It's the same for IPv4. Dunno if it should be considered as a bug?
-- 
Use the "telephone test" for readability.
- The Elements of Programming Style (Kernighan & Plauger)



Fix IPv6 ECMP handling with 4.11+ kernel

2017-08-31 Thread Vincent Bernat
Hey!

Starting from kernel 4.11 (commit beb1afac518d), IPv6 are now notified
using RTA_MULTIPATH, like IPv4 routes. Those routes are not handled
correctly by BIRD. We handle them correctly. This also enable to parse
alien routes correctly. Route modifications is still done in the old
way as for insertion/deletion, this is not helpful to optimize in those
cases and for replace, IPv4 case is not optimized either. It should be
possible to detect appropriate support for RTA_MULTIPATH when receiving
an IPv6 route with this attribute, but I don't see how it would be
helpful, so I didn't do it (simpler code this way).

I did some quick tests and routes are removed/added correctly:

Deleted 2001:db8:a3::/64 proto bird metric 1024
nexthop via 2001:db8:ff::5  dev vti5 weight 1
nexthop via 2001:db8:ff::7  dev vti6 weight 1
2001:db8:a3::/64 via 2001:db8:ff::7 dev vti6 proto bird metric 1024  pref medium

Deleted 2001:db8:a3::/64 via 2001:db8:ff::7 dev vti6 proto bird metric 1024  
pref medium
2001:db8:a3::/64 via 2001:db8:ff::5 dev vti5 proto bird metric 1024  pref medium
2001:db8:a3::/64 proto bird metric 1024
nexthop via 2001:db8:ff::7  dev vti6 weight 1
nexthop via 2001:db8:ff::5  dev vti5 weight 1

Also, alien routes are correctly parsed when next-hops are correctly
ordered (I didn't check if this restriction is also present for IPv4 or
if Linux is always sending IPv4 multipath routes with next-hops
correctly ordered):

2001:db8:a4::/64 metric 1024
nexthop via 2001:db8:ff::5  dev vti5 weight 1
nexthop via 2001:db8:ff::7  dev vti6 weight 1

2001:db8:a4::/64   multipath [kernel1 22:30:54] * (10)
via 2001:db8:ff::5 on vti5 weight 1
via 2001:db8:ff::7 on vti6 weight 1

I'll try to investigate that unless someone already knows the answer.

>From 705d3b93f0527a693cab38357a68c7598a4039cc Mon Sep 17 00:00:00 2001
From: Vincent Bernat <vinc...@bernat.im>
Date: Thu, 31 Aug 2017 21:47:52 +0200
Subject: [PATCH] KRT: Fix IPv6 ECMP with 4.11+ kernels

Starting from kernel 4.11 (commit beb1afac518d), IPv6 are now notified
using RTA_MULTIPATH, like IPv4 routes. Those routes are not handled
correctly by BIRD. We handle them correctly. This also enable to parse
alien routes correctly. Route modifications is still done in the old
way.
---
 sysdep/linux/netlink.c | 51 --
 1 file changed, 37 insertions(+), 14 deletions(-)

diff --git a/sysdep/linux/netlink.c b/sysdep/linux/netlink.c
index 22313f439977..6a3972faea2a 100644
--- a/sysdep/linux/netlink.c
+++ b/sysdep/linux/netlink.c
@@ -59,22 +59,26 @@
 /*
  * Structure nl_parse_state keeps state of received route processing. Ideally,
  * we could just independently parse received Netlink messages and immediately
- * propagate received routes to the rest of BIRD, but Linux kernel represents
- * and announces IPv6 ECMP routes not as one route with multiple next hops (like
- * RTA_MULTIPATH in IPv4 ECMP), but as a set of routes with the same prefix.
+ * propagate received routes to the rest of BIRD, but older Linux kernel (before
+ * 4.11) represents and announces IPv6 ECMP routes not as one route with
+ * multiple next hops (like RTA_MULTIPATH in IPv4 ECMP), but as a set of routes
+ * with the same prefix. More recent kernels work as with IPv4.
  *
  * Therefore, BIRD keeps currently processed route in nl_parse_state structure
  * and postpones its propagation until we expect it to be final; i.e., when
  * non-matching route is received or when the scan ends. When another matching
  * route is received, it is merged with the already processed route to form an
  * ECMP route. Note that merging is done only for IPv6 (merge == 1), but the
- * postponing is done in both cases (for simplicity). All IPv4 routes are just
- * considered non-matching.
+ * postponing is done in both cases (for simplicity). All IPv4 routes or IPv6
+ * routes with RTA_MULTIPATH set are just considered non-matching.
  *
  * This is ignored for asynchronous notifications (every notification is handled
  * as a separate route). It is not an issue for our routes, as we ignore such
  * notifications anyways. But importing alien IPv6 ECMP routes does not work
- * properly.
+ * properly with older kernels.
+ *
+ * Whatever the kernel version is, IPv6 ECMP routes are sent as multiple routes
+ * for the same prefix.
  */
 
 struct nl_parse_state
@@ -320,9 +324,15 @@ static struct nl_want_attrs ifa_attr_want6[BIRD_IFA_MAX] = {
 
 #define BIRD_RTA_MAX  (RTA_TABLE+1)
 
+#ifndef IPV6
 static struct nl_want_attrs mpnh_attr_want4[BIRD_RTA_MAX] = {
   [RTA_GATEWAY]	  = { 1, 1, sizeof(ip4_addr) },
 };
+#else
+static struct nl_want_attrs mpnh_attr_want6[BIRD_RTA_MAX] = {
+  [RTA_GATEWAY]	  = { 1, 1, sizeof(ip6_addr) },
+};
+#endif
 
 #ifndef IPV6
 static struct nl_want_attrs rtm_attr_want4[BIRD_RTA_MAX] = {
@@ -345,6 +355,7 @@ static struct nl_want_attrs rtm_attr_want6[BIRD_RTA_MAX] = {
   [RTA_PRIORITY]  = { 1, 

Re: [RFC] filter profiling

2017-08-29 Thread Vincent Bernat
 ❦ 29 août 2017 10:42 +0300, Lennert Buytenhek  :

>> > Feedback appreciated!  (Better ideas also appreciated. :D)
>> 
>> Using USDT probes? You can attach arbitrary strings to them. I know perf
>> supports them (with a recent kernel) but I don't know how
>> exactly. However, with systemtap, it's dead easy to see them:
>> 
>> stap -e 'probe bird.* { print($$vars) }'
>> 
>> For implementation, see:
>>  
>> https://github.com/vincentbernat/lldpd/commit/bdfe419389075af3cdfeadc78008a157afc2d5d7
>>  
>> https://github.com/vincentbernat/lldpd/commit/140e34f057462d5ff7818ab4af368f055eaad4e3
>
> As far as I can see, these are tracepoints, but they wouldn't let me do
> profiling?  What I need is profiling, as I want to know what's consuming
> the most CPU, so I want to be able to fire an event N times per second
> to tell me what bird is doing right at that specific moment, and listing
> or counting tracepoint invocations won't necessarily tell me what's using
> up the most CPU.

I didn't look at your patch, but from my understanding, when you see a
CPU pike, you sent USR2 and it started "tracing". Use of tracepoints
would do the same thing without having to send a signal.

You could also have a tracepoint for "begin", a tracepoint for "end",
you record the current filter by intercepting "begin", remove it on
"end" and you can profile on CPU to know which is the current
filter if BIRD is currently running on a CPU.
-- 
Conscience doth make cowards of us all.
-- Shakespeare



Re: [RFC] filter profiling

2017-08-29 Thread Vincent Bernat
 ❦ 29 août 2017 06:14 +0300, Lennert Buytenhek  :

> Feedback appreciated!  (Better ideas also appreciated. :D)

Using USDT probes? You can attach arbitrary strings to them. I know perf
supports them (with a recent kernel) but I don't know how
exactly. However, with systemtap, it's dead easy to see them:

stap -e 'probe bird.* { print($$vars) }'

For implementation, see:
 
https://github.com/vincentbernat/lldpd/commit/bdfe419389075af3cdfeadc78008a157afc2d5d7
 
https://github.com/vincentbernat/lldpd/commit/140e34f057462d5ff7818ab4af368f055eaad4e3
-- 
Say what you mean, simply and directly.
- The Elements of Programming Style (Kernighan & Plauger)



Re: { } in BGP.as_path

2017-08-25 Thread Vincent Bernat
 ❦ 25 août 2017 19:27 +0300, Mikhail Mayorov  :

> Hi all!
>
> What is mean '{}' in as_path?

This is a set. This part is not ordered. With only one AS, this doesn't
make much sense, but it can contain several of them. It usually means the
route is an aggregation of several routes.
-- 
Use the fundamental control flow constructs.
- The Elements of Programming Style (Kernighan & Plauger)



Re: Next-hop check not needed?

2017-07-04 Thread Vincent Bernat
 ❦  4 juillet 2017 14:49 +0200, Ondrej Zajicek  :

> The patch looks fine, but there is one conceptual problem - onlink flag
> is not really property of a route, but property of a next hop, so your
> patch is incompatible with ECMP. Unfortunately, we currently do not have
> similar flexile attribute mechanism for per-next-hop data.

Understood. Fortunately, in my case, those routes are not really needed
locally, so I can just filter them and announce them only over
BGP. Thanks for the help.
-- 
Localise input and output in subroutines.
- The Elements of Programming Style (Kernighan & Plauger)



Re: Next-hop check not needed?

2017-06-29 Thread Vincent Bernat
 ❦ 29 juin 2017 11:02 +0200, Vincent Bernat <ber...@luffy.cx> :

>> On the next pass, BIRD doesn't recognize it (KRT: Received route
>> 203.0.113.10/32 with strange next-hop 203.0.113.1) and tries to install
>> it again but it already exists (Netlink: File exists). I'll dig more to
>> find a workaround.
>
> I didn't find any work-around, so I made this patch instead. Works for
> me.

It doesn't replace an existing route when the flag is just flipped. I
don't know what's the best way to do that.
-- 
This was the most unkindest cut of all.
-- William Shakespeare, "Julius Caesar"



Re: Next-hop check not needed?

2017-06-29 Thread Vincent Bernat
 ❦ 28 juin 2017 16:12 +0200, Vincent Bernat <ber...@luffy.cx> :

>> Note that when learning route from the kernel you could workaround it by
>> using 'onlink' route flag.
>
> Great!
>
> It seems to work:
>
> $ ip route show table public dev eth2
> 203.0.113.1 scope link metric 10
> 203.0.113.10 via 203.0.113.1 metric 10 onlink
>
> bird> show route table public
> 203.0.113.10/32via 203.0.113.1 on eth2 [kernel_public 15:58:09] * (200)
> 203.0.113.1/32 dev eth2 [kernel_public 15:58:09] * (200)
>
> But BIRD (1.6.3) still seems to be a bit confused, since I got this
> message in a loop:
>
> 2017-06-28 16:01:20  KRT: Received route 203.0.113.10/32 with strange 
> next-hop 203.0.113.1
> 2017-06-28 16:01:20  Netlink: File exists
>
> Other than that, everything works as expected. I don't know how to
> interpret the first message (is it receiving two routes?) but the second
> message seems to say that it tries to reinstall the same route it
> received.

After investigating a bit more, the problem is triggered by the fact
that BIRD doesn't install the route with the "onlink" attribute. I am
using this kind of configuration to maintain a stripped version of the
routing table for local use:

table local_out;
protocol kernel kernel_local_out {
  persist;
  import none;
  export filter {
krt_prefsrc = loopback_private;
accept;
  };
  scan time 10;
  kernel table 100;
  table local_out;
  device routes yes;
  merge paths yes;
}
protocol pipe private_local_out {
  table private;
  peer table local_out;
  import none;
  export all;
}
protocol pipe public_local_out {
  table public;
  peer table local_out;
  import none;
  export filter {
if proto = "kernel_public" then accept;
reject;
  };
}

So, when the route is copied to the "local_out" table, it is copied
without the "onlink" parameter:

$ ip route show table public dev eth2
203.0.113.1 scope link metric 10
203.0.113.10 via 203.0.113.1 metric 10 onlink
$ ip route show table local-out dev eth2
203.0.113.1 proto bird scope link src 172.22.2.1 metric 10
203.0.113.10 via 203.0.113.1 proto bird src 172.22.2.1 metric 10

On the next pass, BIRD doesn't recognize it (KRT: Received route
203.0.113.10/32 with strange next-hop 203.0.113.1) and tries to install
it again but it already exists (Netlink: File exists). I'll dig more to
find a workaround.
-- 
"Elves and Dragons!" I says to him.  "Cabbages and potatoes are better
for you and me."
-- J. R. R. Tolkien



Re: Next-hop check not needed?

2017-06-28 Thread Vincent Bernat
 ❦ 28 juin 2017 14:01 +0200, Ondrej Zajicek  :

> Note that when learning route from the kernel you could workaround it by
> using 'onlink' route flag.

Great!

It seems to work:

$ ip route show table public dev eth2
203.0.113.1 scope link metric 10
203.0.113.10 via 203.0.113.1 metric 10 onlink

bird> show route table public
203.0.113.10/32via 203.0.113.1 on eth2 [kernel_public 15:58:09] * (200)
203.0.113.1/32 dev eth2 [kernel_public 15:58:09] * (200)

But BIRD (1.6.3) still seems to be a bit confused, since I got this
message in a loop:

2017-06-28 16:01:20  KRT: Received route 203.0.113.10/32 with strange 
next-hop 203.0.113.1
2017-06-28 16:01:20  Netlink: File exists

Other than that, everything works as expected. I don't know how to
interpret the first message (is it receiving two routes?) but the second
message seems to say that it tries to reinstall the same route it
received.
-- 
10.0 times 0.1 is hardly ever 1.0.
- The Elements of Programming Style (Kernighan & Plauger)



Next-hop check not needed?

2017-06-21 Thread Vincent Bernat
Hey

in netlink.c, there is this piece of code:

#v+
  if (a[RTA_GATEWAY])
{
  neighbor *ng;
  ra->dest = RTD_ROUTER;
  memcpy(>gw, RTA_DATA(a[RTA_GATEWAY]), sizeof(ra->gw));
  ipa_ntoh(ra->gw);

#ifdef IPV6
  /* Silently skip strange 6to4 routes */
  if (ipa_in_net(ra->gw, IPA_NONE, 96))
return;
#endif

  ng = neigh_find2(>p, >gw, ra->iface,
   (i->rtm_flags & RTNH_F_ONLINK) ? NEF_ONLINK : 0);
  if (!ng || (ng->scope == SCOPE_HOST))
{
  log(L_ERR "KRT: Received route %I/%d with strange next-hop %I",
  net->n.prefix, net->n.pxlen, ra->gw);
  return;
}
}
#v-

In turns, it calls neigh_find2() which delegates the decision to
if_connected(). if_connected() will return an error if it thinks that
the gateway is not part of the prefix of one of the configured IP
addresses.

This means that if I have the following setup:

ip route add 203.0.113.4/32 dev vnet2
ip route add 203.0.113.15/32 via 203.0.113.4 dev vnet2

BIRD will complain about "strange next-hop" while the setup is perfectly
valid. I don't need an IP address on an interface to route traffic on
it.

I would think the check should just be removed. If the kernel says this
is a valid gateway, it likely is. However, I can't find the reason the
check was added.

In the past, this was just a warning:

#v+
+ ng = neigh_find(>p, , 0);
+ if (ng)
+   ra.iface = ng->iface;
+ else
+   /* FIXME: Remove this warning? */
+   log(L_WARN "Kernel told us to use non-neighbor %I for %I/%d", 
ra.gw, net->n.prefix, net->n.pxlen);
#v-

(commit aa64578641c15b137172acc927d9d7af5914576b)

But this was made an error in commit
9d4d38d1a5d67f5485d2b2fa439c879583dfdcb0.

#v+
- ng = neigh_find(>p, , 0);
+ ng = neigh_find2(>p, , ifa, 0);
  if (ng && ng->scope)
+ {
+   if (ng->iface != ifa)
+ log(L_WARN "KRT: Route with unexpected iface for %I/%d", 
net->n.prefix, net->n.pxlen);
ra.iface = ng->iface;
+ }
  else
-   /* FIXME: Remove this warning? Handle it somehow... */
+ {
log(L_WARN "Kernel told us to use non-neighbor %I for %I/%d", 
ra.gw, net->n.prefix, net->n.pxlen);
+   return;
+ }
#v-

Commit message says "Fixes some problems related to link-local routes in
KRT interface". Does this ring any bell?
-- 
Avoid temporary variables.
- The Elements of Programming Style (Kernighan & Plauger)


Re: Bird dying on nl_get_reply

2017-05-04 Thread Vincent Bernat
 ❦  4 mai 2017 22:37 +0100, "Israel G. Lugo"  :

>> On the hundreds BIRD running on our systems for a few months, it's the
>> first occurrence I had. The server doesn't have that many routes (less
>> than a hundred) either (we have servers with far more routes). I'll wait
>> if it happens again.
>
> Is the server perhaps running other software that writes to netlink? My
> problem that lead to the fix on async socket was related to conntrackd,
> making huge amounts of writes. Perhaps that, or some other kind of
> low-level network related software?

The other daemons are lldpd and ulogd. ulogd didn't log anything in the
same timeframe.
-- 
No violence, gentlemen -- no violence, I beg of you!  Consider the furniture!
-- Sherlock Holmes



Re: Bird dying on nl_get_reply

2017-05-02 Thread Vincent Bernat
 ❦  2 mai 2017 16:47 +0200, Ondrej Zajicek  :

>> Or maybe we could just ignore the error and wait for
>> the next kernel sync to catch up. Or the 8192 value could be configured
>> at build-time. What's the best option?
>
> Well, you could try increase NL_RX_SIZE to say 64k. But the best solution
> would be to have a proper error handling in nl_get_reply(). The main
> question is why the buffer was full as it is not the buffer that get
> async notifications, it just gets responses.

On the hundreds BIRD running on our systems for a few months, it's the
first occurrence I had. The server doesn't have that many routes (less
than a hundred) either (we have servers with far more routes). I'll wait
if it happens again.
-- 
10.0 times 0.1 is hardly ever 1.0.
- The Elements of Programming Style (Kernighan & Plauger)



Bird dying on nl_get_reply

2017-05-02 Thread Vincent Bernat
Hey!

Just got an instance of BIRD dying unexpectedly after displaying the
following message:

nl_get_reply: No buffer space available

It's from netlink.c:

  int x = recvmsg(nl->fd, , 0);
  if (x < 0)
die("nl_get_reply: %m");

Manpage for netlink(7) says an application should expect such a
condition:

   However, reliable transmissions from kernel to user are
   impossible in any case.  The kernel can't send a netlink message
   if the socket buffer is full: the message will be dropped and the
   kernel and the user-space process will no longer have the same
   view of kernel state.  It is up to the application to detect when
   this happens (via the ENOBUFS error returned by recvmsg(2)) and
   resynchronize.

Another possibility would be to use NETLINK_NO_ENOBUFS socket option:

   This flag can be used by unicast and broadcast listeners to avoid
   receiving ENOBUFS errors.

I don't think using this flag is a good idea.

I thought this problem has already been reported recently, but I didn't
find the thread back. The receive buffer could be increased dynamically
when this happens. Or maybe we could just ignore the error and wait for
the next kernel sync to catch up. Or the 8192 value could be configured
at build-time. What's the best option?
-- 
10.0 times 0.1 is hardly ever 1.0.
- The Elements of Programming Style (Kernighan & Plauger)


Re: BGP graceful restart and BFD

2016-12-02 Thread Vincent Bernat
 ❦  2 décembre 2016 17:11 +0100, Ondrej Zajicek  :

> There are three different cases:
>
> 1) regular (administartive) shutdown/restart
> 2) planned graceful restart (e.g. software version update)
> 3) unplanned graceful restart (e.g. software crash and respawn)
>
> Regular shutdown command does (1), so it is expected to see regular BGP
> session shutdown. Case (3) should work without much problems. But there
> is no explicit support for case (2), you have to use kill -9 as we are
> missing some command that explicitly activates graceful restart.
>
>
>> The second problem I run into is when using BFD. If I kill -9 bird, BFD
>> will quickly detects the problem and shutdown the BGP session. It will
>> not be considered a graceful restart either.
>
> We should have better handling of C-bit in BFD (for example, we have
> the same behavior regardless of neighbor's C-bit value). But still
> there is a fundamental limitation of having BFD in control plane or
> even in the same process.
>
> There is one potential solution - for case (2), we could explicitly
> shutdown BFD sessions when graceful restart is requested. As graceful
> restart is just an avisory mechanism, BGP should survive shutdown of
> BFD session, then regular BGP graceful restart should work.

It seems easy enough to do. I may have a look at this point since it's
my main interest. I don't expect things to crash and can live with not
having a graceful restart in this case. Would it be better to enable
graceful signal with a signal or through the socket?
-- 
The human race has one really effective weapon, and that is laughter.
-- Mark Twain



BGP graceful restart and BFD

2016-12-02 Thread Vincent Bernat
Hey!

I am trying to make BGP graceful restart work. First, I noticed that BGP
graceful restart can only work if BIRD doesn't close cleanly the BGP
session. Otherwise, an administrative shutdown is sent and the other end
(also BIRD) cleans all routes and don't consider this as a graceful
restart.

2016-12-02 10:09:24  R1: Received: Administrative shutdown
2016-12-02 10:09:24  R1: BGP session closed
2016-12-02 10:09:24  R1: State changed to stop
2016-12-02 10:09:24  R1 > removed [sole] 203.0.113.0/24 via 
192.0.2.1 on eth0

Is that an expected behavior?

The second problem I run into is when using BFD. If I kill -9 bird, BFD
will quickly detects the problem and shutdown the BGP session. It will
not be considered a graceful restart either.

2016-12-02 10:52:50  R1: Neighbor graceful restart detected
2016-12-02 10:52:50  R1: State changed to start
2016-12-02 10:52:50  R1: BGP session closed
2016-12-02 10:52:50  R1: Connect delayed by 5 seconds
2016-12-02 10:52:51  R1: BFD session down
2016-12-02 10:52:51  R1: State changed to stop
2016-12-02 10:52:51  R1 > removed [sole] 203.0.113.0/24 via 
192.0.2.1 on eth0

Therefore, BFD seems incompatible with graceful restart. The Juniper
implementation has some provisions to make BFD and BGP graceful restart
works together:

> So that BFD can maintain its BFD protocol sessions across a BGP
> graceful restart, BGP requests that BFD set the C bit to 1 in
> transmitted BFD packets. When the C bit is set to 1, BFD can
> maintain its session in the forwarding plane in spite of disruptions
> in the control plane. Setting the bit to 1 gives BGP neighbors
> acting as a graceful restart helper the most accurate information
> about whether the forwarding plane is up.
>
> When BGP is acting as a graceful restart helper and the BFD session
> to the BGP peer is lost, one of the following actions takes place:
>  - If the C bit received in the BFD packets was 1, BGP immediately
>flushes all routes, determining that the forwarding plane on the
>BGP peer has gone down.
>  - If the C bit received in the BFD packets was 0, BGP marks all
>routes as stale but does not flush them because the forwarding
>plane on the BGP peer might be working and only the control plane
>has gone down.

Unrelated to BGP restart but related to BFD, if one BGP peer has a
temporary network issue, BFD will quickly close the session and then
require a startup delay for the session. When the network outage is
solved and one peer tries to reconnect, the session is rejected because
of this startup delay:

2016-12-02 11:03:55  R1: State changed to start
2016-12-02 11:03:55  R1: Startup delayed by 60 seconds due to errors
2016-12-02 11:04:02  R1: Incoming connection from 192.0.2.1 (port 
49205) rejected
2016-12-02 11:04:07  R1: Incoming connection from 192.0.2.1 (port 
36449) rejected

The delay can be configured to a lower value, but is it the expected
behavior? The current code is:

acc = (p->p.proto_state == PS_START || p->p.proto_state == PS_UP) &&
  (p->start_state >= BSS_CONNECT) && (!p->incoming_conn.sk);

Could this be changed to?

acc = (p->p.proto_state == PS_START || p->p.proto_state == PS_UP) &&
  (p->start_state >= BSS_DELAY) && (!p->incoming_conn.sk);

I have put a more detailed summary of my investigations here:
 
https://github.com/vincentbernat/network-lab/tree/caceb38e8543ec22a7693611bbd84cdf36e92e12/lab-bgp-graceful-restart
-- 
Use uniform input formats.
- The Elements of Programming Style (Kernighan & Plauger)


Re: Bird upgrade kills all sessions for minutes.

2016-10-11 Thread Vincent Bernat
 ❦ 11 octobre 2016 11:03 CEST, Alexander Morlang  :

> In our careful evaluation, we think, using the -R parameter of
> dh_installinit could ease our suffering, resulting in following patch:
>
> diff -ruN bird-1.6.2.old/debian/rules bird-1.6.2/debian/rules
> --- bird-1.6.2.old/debian/rules   2016-09-29 20:49:00.0 +0200
> +++ bird-1.6.2/debian/rules   2016-10-11 10:27:00.717429976 +0200
> @@ -69,8 +69,8 @@
>   dh_strip -O--dbgsym-migration='bird-dbg (<< 1.6.0-2~), 
> bird-bgp-dbg (<< 1.6.0-2~)'
>
> override_dh_installinit:
> - dh_installinit --name=bird
> - dh_installinit --name=bird6
> + dh_installinit --name=bird -R
> + dh_installinit --name=bird6 -R

You should report the bug against Debian since this is where the
packaging is done. Your patch is likely to be accepted since this is
already what is done for systemd.
-- 
Make it right before you make it faster.
- The Elements of Programming Style (Kernighan & Plauger)



Re: [PATCH] Netlink: add krt_scope attribute

2016-09-15 Thread Vincent Bernat
 ❦ 15 septembre 2016 18:05 CEST, Vincent Bernat <vinc...@bernat.im> :

> It would have been neat to be able to use "SCOPE_LINK" in a filter
> or to show "SCOPE_LINK" when displaying the route, but currently, only
> the numerical value can be used.

Any hint for this part would be welcome. The T_ENUM_SCOPE is ignored and
SCOPE_LINK is considered as non-int for some reason.
-- 
Take care to branch the right way on equality.
- The Elements of Programming Style (Kernighan & Plauger)



[PATCH] Netlink: add krt_scope attribute

2016-09-15 Thread Vincent Bernat
The general-purpose scope attribute is detailed in the documentation as
an attribute that BIRD won't set and won't use. Therefore, to expose the
scope of a kernel route, this commit adds a new attribute,
krt_scope. Its possible values are the numeric values for SCOPE_*
variables (1 is SCOPE_LINK for example), not the values from the
kernel (253 is RT_SCOPE_LINK). Both import and export are supported.

A typical use case is to install "scope link" routes for directly
connected destinations, as the kernel won't accept routes with a broader
scope when issuing an internal lookup.

It would have been neat to be able to use "SCOPE_LINK" in a filter
or to show "SCOPE_LINK" when displaying the route, but currently, only
the numerical value can be used.

Signed-off-by: Vincent Bernat <vinc...@bernat.im>
---
 sysdep/linux/krt-sys.h |  1 +
 sysdep/linux/netlink.Y |  3 ++-
 sysdep/linux/netlink.c | 40 ++--
 3 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/sysdep/linux/krt-sys.h b/sysdep/linux/krt-sys.h
index 96688e34c860..6d6586d10f02 100644
--- a/sysdep/linux/krt-sys.h
+++ b/sysdep/linux/krt-sys.h
@@ -36,6 +36,7 @@ static inline struct ifa * kif_get_primary_ip(struct iface 
*i) { return NULL; }
 
 #define EA_KRT_PREFSRC EA_CODE(EAP_KRT, 0x10)
 #define EA_KRT_REALM   EA_CODE(EAP_KRT, 0x11)
+#define EA_KRT_SCOPE   EA_CODE(EAP_KRT, 0x12)
 
 
 #define KRT_METRICS_MAX0x10/* RTAX_QUICKACK+1 */
diff --git a/sysdep/linux/netlink.Y b/sysdep/linux/netlink.Y
index a1c22f3ece17..c1c9503c33ec 100644
--- a/sysdep/linux/netlink.Y
+++ b/sysdep/linux/netlink.Y
@@ -10,7 +10,7 @@ CF_HDR
 
 CF_DECLS
 
-CF_KEYWORDS(KERNEL, TABLE, METRIC, KRT_PREFSRC, KRT_REALM, KRT_MTU, KRT_WINDOW,
+CF_KEYWORDS(KERNEL, TABLE, METRIC, KRT_PREFSRC, KRT_REALM, KRT_SCOPE, KRT_MTU, 
KRT_WINDOW,
KRT_RTT, KRT_RTTVAR, KRT_SSTRESH, KRT_CWND, KRT_ADVMSS, 
KRT_REORDERING,
KRT_HOPLIMIT, KRT_INITCWND, KRT_RTO_MIN, KRT_INITRWND, KRT_QUICKACK,
KRT_LOCK_MTU, KRT_LOCK_WINDOW, KRT_LOCK_RTT, KRT_LOCK_RTTVAR,
@@ -28,6 +28,7 @@ kern_sys_item:
 
 CF_ADDTO(dynamic_attr, KRT_PREFSRC { $$ = 
f_new_dynamic_attr(EAF_TYPE_IP_ADDRESS, T_IP, EA_KRT_PREFSRC); })
 CF_ADDTO(dynamic_attr, KRT_REALM   { $$ = f_new_dynamic_attr(EAF_TYPE_INT, 
T_INT, EA_KRT_REALM); })
+CF_ADDTO(dynamic_attr, KRT_SCOPE   { $$ = f_new_dynamic_attr(EAF_TYPE_INT, 
T_ENUM_SCOPE, EA_KRT_SCOPE); })
 
 CF_ADDTO(dynamic_attr, KRT_MTU { $$ = f_new_dynamic_attr(EAF_TYPE_INT, 
T_INT, EA_KRT_MTU); })
 CF_ADDTO(dynamic_attr, KRT_WINDOW  { $$ = f_new_dynamic_attr(EAF_TYPE_INT, 
T_INT, EA_KRT_WINDOW); })
diff --git a/sysdep/linux/netlink.c b/sysdep/linux/netlink.c
index 9bdcc0d2ff19..9f73e0feeeaf 100644
--- a/sysdep/linux/netlink.c
+++ b/sysdep/linux/netlink.c
@@ -900,6 +900,15 @@ nl_send_route(struct krt_proto *p, rte *e, struct ea_list 
*eattrs, int op, int d
   r.r.rtm_dst_len = net->n.pxlen;
   r.r.rtm_protocol = RTPROT_BIRD;
   r.r.rtm_scope = RT_SCOPE_UNIVERSE;
+  if (ea = ea_find(eattrs, EA_KRT_SCOPE))
+switch (ea->u.data)
+  {
+  case SCOPE_HOST: r.r.rtm_scope = RT_SCOPE_HOST; break;
+  case SCOPE_LINK: r.r.rtm_scope = RT_SCOPE_LINK; break;
+  case SCOPE_SITE: r.r.rtm_scope = RT_SCOPE_SITE; break;
+  case SCOPE_UNIVERSE: r.r.rtm_scope = RT_SCOPE_UNIVERSE; break;
+  }
+
   nl_add_attr_ipa(, sizeof(r), RTA_DST, net->n.prefix);
 
   /*
@@ -1157,7 +1166,6 @@ nl_parse_route(struct nl_parse_state *s, struct nlmsghdr 
*h)
return;
 }
 
-
   if (a[RTA_DST])
 {
   memcpy(, RTA_DATA(a[RTA_DST]), sizeof(dst));
@@ -1195,11 +1203,6 @@ nl_parse_route(struct nl_parse_state *s, struct nlmsghdr 
*h)
   if ((c < 0) || !(c & IADDR_HOST) || ((c & IADDR_SCOPE_MASK) <= SCOPE_LINK))
 SKIP("strange class/scope\n");
 
-  // ignore rtm_scope, it is not a real scope
-  // if (i->rtm_scope != RT_SCOPE_UNIVERSE)
-  //   SKIP("scope %u\n", i->rtm_scope);
-
-
   switch (i->rtm_protocol)
 {
 case RTPROT_UNSPEC:
@@ -1304,6 +1307,27 @@ nl_parse_route(struct nl_parse_state *s, struct nlmsghdr 
*h)
   return;
 }
 
+  int krt_scope = -1;
+  switch (i->rtm_scope)
+{
+case RT_SCOPE_HOST: krt_scope = SCOPE_HOST; break;
+case RT_SCOPE_LINK: krt_scope = SCOPE_LINK; break;
+case RT_SCOPE_SITE: krt_scope = SCOPE_SITE; break;
+case RT_SCOPE_UNIVERSE: krt_scope = SCOPE_UNIVERSE; break;
+}
+  if (krt_scope != -1)
+{
+  ea_list *ea = lp_alloc(s->pool, sizeof(ea_list) + sizeof(eattr));
+  ea->next = ra->eattrs;
+  ra->eattrs = ea;
+  ea->flags = EALF_SORTED;
+  ea->count = 1;
+  ea->attrs[0].id = EA_KRT_SCOPE;
+  ea->attrs[0].flags = 0;
+  ea->attrs[0].type = EAF_TYPE_INT;
+  ea->attrs[0].u.data = krt_scope;
+}
+
   if (a[RTA_PREFSRC])
 {