Re: evpn rebase to HEAD

Pim van Pelt via Bird-users Fri, 20 Feb 2026 04:28:35 -0800

Hoi,

On 20.02.2026 01:31, Ondrej Zajicek wrote:

That is the fake interface from if_get_by_name(). Using them in route
nexthops is 'fine' on the level that it does not crash due to NULL
dereference, but they were never supposed be used this way, they are
just placeholders for configuration.


Note that these fake interfaces are horrible hack in BIRD code, as
properly there should be two distinct structures: iface_config and
iface, the former representing interface referenced in config file, and
the latter representing real kernel interfaces found by 'device' protocol.
But we use the same structure for both cases.

Understood - once iface_config and iface are split, I can make use ofeither construct (the iface_config one makes more sense). Neither theinterface name or kernel device are necessary in my implementation.

I wonder if your setup would work, if you instead of using this fake interface
use some real placeholder interface, say loopback:

'encapsulation vxlan { tunnel device "lo"; };'

It works fine. As an aside, reconfiguring causes a restart of evpnprotocol, which trips an assertion and crashes. The crash also happenson 'birdc disable evpn1'.

Feb 20 12:12:29 vpp0-3 bird[1455113]: Restarting protocol evpn1

Feb 20 12:12:29 vpp0-3 bird[1455113]: Assertion 'pub->queue &&pub->topic' failed at lib/pubsub.c:161Feb 20 12:12:29 vpp0-3 systemd[1]: bird-dataplane.service: Main processexited, code=killed, status=11/SEGV

Either way, Bird comes back up and works just fine using tunnel_dev setto "lo". It reminds me that I already use this trick, as MAC addresseslearned from VPP's bridge-domain do not have any corresponding Linux orBird interface, so I inject them into etab using "lo" as well.

The 'cheat' have to be modified (it should wait for the interface,
but will ignore the fact that the interface is not a tunnel (i.e.
skip/ignore evpn_validate_iface_attrs()).

I like that. Perhaps a keyword in the config can signal that this is OK,like 'tunnel device "evpn0-dummy" virtual;' or just 'tunnel device "lo"virtual;'

Note that you should read IMET from etab too. EVPN protocol translate
all IMETs from evpntab to etab, otherwise even our kernel-based setup
would not work -- 'bridge' protocol that configures kernel bridge also
reads just etab.

I do not have multiple IMETs in etab, only one:
root@vpp0-0:/etc/bird# birdc show route table evpntab | grep imet

evpn imet 8298:100 0 2001:678:d78:200::3 [vpp0_3 12:12:38.484 from2001:678:d78:200::3] * (100) [i]evpn imet 8298:100 0 2001:678:d78:200::2 [vpp0_2 11:18:21.821 from2001:678:d78:200::2] * (100) [i]evpn imet 8298:100 0 2001:678:d78:200::1 [vpp0_1 11:18:21.253 from2001:678:d78:200::1] * (100) [i]

evpn imet 8298:100 0 2001:678:d78:200:: unicast [evpn1 11:18:07.285] * (120)

root@vpp0-0:/etc/bird# birdc show route table etab | grep 00:00:00:
00:00:00:00:00:00 vlan 100 mpls 10040 unicast [evpn1 12:12:38.484] * (80)

Perhaps I'm holding it wrong (see bird-example.conf). It would actuallybe super if I could rely /only/ on etab, as tracking both etab andevpntab was a fair amount of extra code.

I agree and this split of work between 'evpn' protocol and 'bridge' protocol
(with separate 'evpn table' and 'eth table') are going to stay.

Thank you! That's great news for me.

I am happy to share the 'vppevpn' protocol with others also, as an example
'3P integration'. I do not expect it to be upstreamed into Bird2, unless
there are community requests for it.
Ondrej, do let me know if you'd like to take a sneak peak at my code (it's
in a private repo for now, as it's not ready for wider review yet, but it is
mostly functional).

Having better integration with VPP (or some other userspace dataplane)
is something we are interested in general, but i would not look at it
before i finish some other tasks (including merging EVPN) as i am rather
overwhelmed.

I can volunteer my time to write a vpp protocol (for ip4, ip6, mpls FIBand interfaces). I'll contact you separately for that, it sounds like aworthwhile project and I've kind of always wanted to do it.

(3) Setting BGP Next Hop clears MPLS Labelstack, filters cannot set this.
When the BGP Next Hop is changed by an export filter, we lose the MPLS
labelstack. There is no way to add MPLS labelstack in filters (at least,
that I could find), so we cannot use 'next hop address X' to determine the
Type-2 MAC VxLAN endpoint. Note: IMET updates do not use the BGP Next Hop,
but rather a PSMI attribute with the 'router address' already.

Resetting MPLS label when changing next hop is intentional, as MPLS labels are
(in general) specific to receiving routers.

There is gw_mpls (and undocumented/semantically broken gw_mpls_stack)
attribute that could be accessed in filters.

I am not sure what is your use case here to change it with filters, can
you describe it more? What about setting 'router address' in EVPN proto?

With the oz-evpn branch as-is, setting 'router address' in evpn proto will:
1) copy that to the PSMI attribute: good
2) not do anything for MAC announcements; they will have BGP.next_hop set to
the session address.

if the previous patch in (2) is accepted, then 'router address' will be used
as BGP.next_hop, which will avoid the need to change it with filters with
(3).

Oh, i see. You are right, this should work automatically for both IMET / PMSI
and MAC.

I do not like using regular/immediate next hops here in EVPN table, as
it does not fit well semantically and requires formal device. But seems
to me that a reasonable alternative would be to just attach BGP_NEXT_HOP
by EVPN protocol, similarly how BGP_PMSI_TUNNEL is attached. Wil do that.
Any comments?

If you were to attach a specific attribute like vxlan_nexthop or vxlan_vni
to the etab table entry, I would simply read that and use it instead of the
bgp nexthop. That's what happens already today for IMET, as it has the
BGP.pmsi_tunnel attribute with the needed ingress-replication
2001:678:d78:200::2 mpls 10040 information. How do other vendors (say
Arista, Cisco, Nokia, FRRouting) handle the Type-2 nexthop? My understanding
is they use BGP next hop for that (in other words, the same as how Bird does
it today).

I think there is some confusion here. I am talking about evpntab
entries, not about etab entries. And about your patch that sets router
IP into their immediate next hop (nh.gw).

I see - then maybe I can try a different approach. The patch, I thought,makes Bird behave the same as Nokia SRLinux {1], which also sets therouter ip (the local VTEP) as nexthop but what you're saying is I shouldnot set the /immediate/ nexthop, but rather leave that alone and set the/BGP Next Hop/? Although as a reminder, I need to be able to set an IPv4BGP Next Hop on an IPv6 session only for some RTs, not all. See one morethought on that below ..

Not sure I understand what you mean - don't we have this problem also for
kernel based vxlan? If we create a vxlan0 interface in a bridge, and set a
fdb entry onto it, we also need to know which vxlan nexthop to use. The way
I read 'evpn' and 'oz-evpn', we use the BGP nexthop for that purpose.
However, if what you're saying is you'd want to remove the BGP Next Hop and
instead have an EVPN VxLAN Next Hop attribute to populate the 'etab' gateway
field that would work just as well for me. I kind of wonder why you'd go to
the trouble obfuscating the BGP Next Hop. Don't other vendors use the same
thing (send vxlan packet to the address learned via the BGP Next Hop in
Type-2 announcements) ?

I just mean that immediate next hop fields for evpntab routes received
through BGP are irrelevant, while the BGP next hop attribute is the
important one. When 'evpn' protocol takes a route from evpntab and convert
it to etab entry, it examines BGP next hop, not immediate next hop.

OK I think I understand now.

While i agree that it should work automatically by just setting router
address in protocol evpn, i think that this setup that should work even
without patches:

   protocol evpn {
     ...
     encapsulation vxlan { router address 192.0.2.1; };
   }
   protocol bgp {
     evpn { import all; export all; next hop address 192.0.2.1; };
     local 2001:db8::1 as 65512;
     neighbor 2001:db8::2 as 65512;
   }

I don't think this works for MAC, for IMET it works because that has a
custom PSMI BGP attribute which is set to encap0->router_addr). Setting the
next hop in this way will clear the mpls labelstack. So we'd end up with:
fe:54:00:f0:11:02 vlan 100 mpls 0 unicast [evpn1 2026-02-19] * (80)
         via 192.0.2.1 on vxlan0 mpls 0
and we'd lose the VNI.

I think it will not clear the MPLS labelstack. This is not setting next
hop in filters. The difference between

   evpn { import all; export all; next hop address 192.0.2.1; };

and

   evpn { import all; export all; };

in BGP protocol export is only where the BGP next hop value is taken
from (explicitly configured one or source address from BGP session), but
route processing is the same. See bgp_update_next_hop_ip(), the
!bgp_use_next_hop(s, a) and !bgp_use_gateway(s) case.

I tried this, and you are correct that 'next hop address' works andleaves the MPLS labelstack alone:

00:00:00:00:00:00 vlan 200 mpls 20040 unicast [evpn2 12:54:44.880] * (80)
        via 192.168.10.0 on lo mpls 20040
fe:54:00:f0:11:03 vlan 200 mpls 20040 unicast [evpn2 12:54:44.880] * (80)
        via 192.168.10.0 on lo mpls 20040

Now let's suppose I have two evpn protocols, one with an IPv4 routeraddress and one with an IPv6 router address. In this scenario, I can'tuse 'next hop address' because it'll force both to use that address family.


It yields a bad state:
1) as before, the IPv4-only evpn (VNI 20040) works

2) but now, the evpn with an IPv6 router address, sends IMET with IPv6,and MAC with IPv4

00:00:00:00:00:00 vlan 100 mpls 10040 unicast [evpn1 13:07:07.455] * (80)
        via 2001:678:d78:200:: on lo mpls 10040
        Type: EVPN univ
        mpls_label: 10040
fe:54:00:f0:11:02 vlan 100 mpls 10040 unicast [evpn1 13:07:07.455] * (80)
        via 192.168.10.0 on lo mpls 10040
        Type: EVPN univ
        mpls_label: 10040

An obvious solution is to use a filter, like this one:
filter bgp_evpn_out {

if (rt, 8298, 10040) ~ bgp_ext_community then { bgp_next_hop =192.168.10.3; } if (rt, 8298, 20040) ~ bgp_ext_community then { bgp_next_hop =2001:678:d78:200::3; }

  accept;
}

template bgp T_BGP_EVPN {
  evpn { import all; export filter bgp_evpn_out; };
  local 2001:678:d78:200::3 as 65512;
}

But now the filter does destroy the MPLS labelstack, although thempls_label attribute remains:

00:00:00:00:00:00 vlan 100 mpls 10040 unicast [evpn1 13:00:01.031] * (80)
        via 2001:678:d78:200:: on lo mpls 10040
        Type: EVPN univ
        mpls_label: 10040
fe:54:00:f0:11:02 vlan 100 mpls 10040 unicast [evpn1 13:00:01.031] * (80)
*        via 2001:678:d78:200:: on lo mpls 0*
        Type: EVPN univ
        mpls_label: 10040

00:00:00:00:00:00 vlan 200 mpls 20040 unicast [evpn2 13:00:01.031] * (80)
        via 192.168.10.0 on lo mpls 20040
        Type: EVPN univ
        mpls_label: 20040
fe:54:00:f0:11:03 vlan 200 mpls 20040 unicast [evpn2 13:00:01.031] * (80)
*        via 192.168.10.0 on lo mpls 0*
        Type: EVPN univ
        mpls_label: 20040

My conclusion was: I need to be able to apply filters without destroyingthe MPLS labels. If I now understand correctly, I can remove thenh.gw/nh.iface from evpn_announce_mac() and evpn_announce_imet(), butkeep the change in bgp_update_next_hop_ip()

@@ -1314,19 +1310,6 @@ bgp_update_next_hop_ip(struct bgp_export_state*s, eattr *a, ea_list **to)

     }
   }

+ /* For L2VPN (EVPN): ensure MPLS label stack is set even if next hopwas filter-overridden */+ if (s->mpls && bgp_channel_is_l2vpn(s->channel) && !bgp_find_attr(*to,BA_MPLS_LABEL_STACK))

+  {
+    rta *ra = s->route->attrs;
+    if (ra->nh.labels)

+ bgp_set_attr_data(to, s->pool, BA_MPLS_LABEL_STACK, 0,ra->nh.label, ra->nh.labels * 4);

+    else
+    {
+      u32 label = ea_get_int(ra->eattrs, EA_MPLS_LABEL, BGP_MPLS_NULL);
+      bgp_set_attr_data(to, s->pool, BA_MPLS_LABEL_STACK, 0, &label, 4);
+    }
+  }

This allows the above filter to work while preserving the labelstack:
00:00:00:00:00:00 vlan 100 mpls 10040 unicast [evpn1 13:15:42.571] * (80)
        via 2001:678:d78:200:: on lo mpls 10040
        Type: EVPN univ
        mpls_label: 10040
fe:54:00:f0:11:02 vlan 100 mpls 10040 unicast [evpn1 13:15:42.571] * (80)
*        via 2001:678:d78:200:: on lo mpls 10040*
        Type: EVPN univ
        mpls_label: 10040

00:00:00:00:00:00 vlan 200 mpls 20040 unicast [evpn2 13:15:42.571] * (80)
        via 192.168.10.0 on lo mpls 20040
        Type: EVPN univ
        mpls_label: 20040
fe:54:00:f0:11:03 vlan 200 mpls 20040 unicast [evpn2 13:15:42.571] * (80)
*        via 192.168.10.0 on lo mpls 20040*
        Type: EVPN univ
        mpls_label: 20040

Of course, open to better solutions :)

groet,
Pim

[1] A:pim@asw121# show network-instance default protocols bgp routesevpn route-type 2 detail | more

Route Distinguisher: 65500:264
Tag-ID             : 0
MAC address        : 64:9D:99:D0:70:4D
IP Address         : 10.26.0.1
neighbor           : 198.19.16.0
path-id            : 0
Received paths     : 1
  Path 1: <Best,Valid,Used,>
    ESI               : 00:00:00:00:00:00:00:00:00:00
    Label             : 264

Route source : neighbor 198.19.16.0 (last modified 68d14h37m6sago)

    Route preference  : No MED, LocalPref is 100
    Atomic Aggr       : false
    BGP next-hop      : 198.19.18.0
    AS Path           :  i
    Communities       : [target:65500:264, bgp-tunnel-encap:VXLAN]
    RR Attributes     : No Originator-ID, Cluster-List is []
    Aggregation       : None
    Unknown Attr      : None
    Invalid Reason    : None
    Tie Break Reason  : none
    Route Flap Damping: None

--
Pim van Pelt<[email protected]>
PBVP1-RIPEhttps://ipng.ch/

## Manual configuration for vpp1-0

eth table etab;
eth table etab100;
eth table etab200;
evpn table evpntab;

protocol static {
  eth { table etab; };
  route eth 00:00:01:00:00:01 vlan 100 prohibit;
  route eth 00:00:02:00:00:01 vlan 200 prohibit;
}

protocol evpn {
  debug all;
  eth { table etab; };
  evpn { import all; export all; };
  rd 8298:100;
  import target (rt, 8298, 10040);
  export target (rt, 8298, 10040);
  encapsulation vxlan {
    tunnel device "vxlan0";
    router address 2001:678:d78:200::;
  };
  vni 10040;
  vid 100;
}; 

protocol evpn {
  debug all;
  eth { table etab; };
  evpn { import all; export all; };
  rd 8298:200;
  import target (rt, 8298, 20040);
  export target (rt, 8298, 20040);
  encapsulation vxlan {
    tunnel device "vxlan0";
    router address 192.168.10.0;
  };
  vni 20040;
  vid 200;
}; 

filter bgp_evpn_out {
#  if (rt, 8298, 20040) ~ bgp_ext_community then { bgp_next_hop = 
2001:678:d78:200::; }
  accept;
}

template bgp T_BGP_EVPN {
  evpn { import all; export filter bgp_evpn_out; };
  local 2001:678:d78:200:: as 65512;
}

protocol bgp vpp0_1 from T_BGP_EVPN { neighbor 2001:678:d78:200::1 as 65512; }
protocol bgp vpp0_2 from T_BGP_EVPN { neighbor 2001:678:d78:200::2 as 65512; }
protocol bgp vpp0_3 from T_BGP_EVPN { neighbor 2001:678:d78:200::3 as 65512; }

protocol vppevpn bd100 {
  debug all;
  eth { table etab; import all; export all; };

  vxlan ipv6 src 2001:678:d78:200::;
  vxlan ipv4 src 192.168.10.0;
  vxlan src port 4789;
  vxlan dst port 4789;
  bridge domain 100;
  scan time 5;
  vid 100;
  vni 10040;
};

protocol vppevpn bd200 {
  debug all;
  eth { table etab; import all; export all; };

  vxlan ipv6 src 2001:678:d78:200::;
  vxlan ipv4 src 192.168.10.0;
  vxlan src port 4789;
  vxlan dst port 4789;
  bridge domain 200;
  bridge mac age 10;
  scan time 5;
  vid 200;
  vni 20040;
};

Re: evpn rebase to HEAD

Reply via email to