On Thu, Feb 19, 2026 at 11:11:05PM +0100, Pim van Pelt wrote:
> Hoi,
> 
> Thanks for your time Ondrej, and apologies Maria for mistyping your name,
> Mrs IPng Networks is called Marina so that kind of just rolls off the
> keyboard sometimes :)
> 
> On 19.02.2026 18:04, Ondrej Zajicek wrote:
> > On Wed, Feb 18, 2026 at 09:59:05PM +0100, Pim van Pelt wrote:
> > > Hoi,
> > > 
> > > Thanks for taking a look, Marina and Ondrej, I appreciate it!
> > > 
> > > On 18.02.2026 17:50, Ondrej Zajicek wrote:
> > > > As others noted, the relevant branch is 'oz-evpn', the older 'evpn'
> > > > branch fell victim to my needlesly strict adherence to "do not rebase
> > > > public branch" rule. The patches in 'oz-evpn' are not only rebased on
> > > > newer BIRD version, but also have fixes squashed in them, and there is
> > > > newer development. I just pushed there rebase to 2.18. Please look at
> > > > this branch first. Also note there are some minor changes to EVPN 
> > > > protocol
> > > > configuration syntax.
> > > I have ported by vppevpn protocol implementation to be based on oz-evpn, 
> > > and
> > > the system is functional here also. Yaay!
> > > 
> > > I only had one small issue. In oz-evpn, the 'evpn' protocol will stay in
> > > 'startup' until the vxlan0 interface becomes ready. However, in my 
> > > usecase,
> > > vxlan is not performed by the kernel, but by VPP, so there is no 'vxlan0'
> > > interface. I need only 'vni' and 'router address' (and the remote VTEP) to
> > > construct the dataplane configuration. To allow the evpn protocol to
> > > transition to PS_UP, I decided to fire an event that announces the IMET if
> > > router_addr and VNI are set, and skips waiting for the interface.
> > Hmm, you have NULL interface in the encap->tunnel_dev? Or some fake 
> > interface
> > created by if_get_by_name()? Or some dummy/irrelevant interface (loopback)?
>
> I do specify an 'encapsulation vxlan { tunnel device "vxlan0";};'. It
> satisfies Bird2 by having an interface, it just doesn't exist in the kernel.
> In branch 'evpn' this was fine, in branch 'oz-evpn' this needs me to cheat a
> bit because we're waiting on the device to be oper-up and enslaved to the
> bridge. If I skip that part, everything works fine without any kernel
> interaction. See below in [1] for my cheat.

That is the fake interface from if_get_by_name(). Using them in route
nexthops is 'fine' on the level that it does not crash due to NULL
dereference, but they were never supposed be used this way, they are
just placeholders for configuration.

Note that these fake interfaces are horrible hack in BIRD code, as
properly there should be two distinct structures: iface_config and
iface, the former representing interface referenced in config file, and
the latter representing real kernel interfaces found by 'device' protocol.
But we use the same structure for both cases.

I wonder if your setup would work, if you instead of using this fake interface
use some real placeholder interface, say loopback:

'encapsulation vxlan { tunnel device "lo"; };'

The 'cheat' have to be modified (it should wait for the interface,
but will ignore the fact that the interface is not a tunnel (i.e.
skip/ignore evpn_validate_iface_attrs()).

I am thinking about how to integrate your use case into oz-evpn, and
this seems to me as a much saner way.


> > Note that the nexthops of VXLAN-tunneled routes in bridge table are just
> > makeshift now, esp. usage of nh.gw for encap-dst-ip and nh->label[0]
> > encap-vni, these should get their own attributes (once we will redesign
> > nexthops to have proper attributes).
>
> The information I needed for my usecase, is nexthop '192.168.10.2', and
> mpls_label '20040' from etab, and IMET from evpntab (because in P2MP there
> will be multiple IMETs and etab will only carry one of them).

Note that you should read IMET from etab too. EVPN protocol translate
all IMETs from evpntab to etab, otherwise even our kernel-based setup
would not work -- 'bridge' protocol that configures kernel bridge also
reads just etab.


> I've implemented also 'vid', as you see above 200, but it carries no meaning
> for VPP because the bridge-domain can be separately configured to allow
> untagged, single-tagged or double-tagged in the PE interfaces. If new
> attributes (like the vxlan nexthop or vxlan vni you suggest below) were to
> appear, it will be easy for me to switch to using them instead.

okay.


> > I am often uncertain how much BIRD representation of routes should match
> > Linux API representation of routes (esp. for idiosyncratic details like
> > here when Linux API assumes nominal tunnel interfaces in next hop
> > interfaces for lightweight tunnels), but i usually defer to try to keep
> > it consistent to limit impedance mismatch here. But it may cause
> > problems when other backends with different conventions are used, like
> > in your case.
> I think assuming by default a linux 'bridge' with its tunneling
> functionality is perfectly fine, although I'd prefer it if it does not
> become the /only/ valid way:
> 1) I'm not sure if that works well on other platforms (eg FreeBSD, Windows,
> MacOS)
> 2) or embedded platforms (eg Broadcom or Marvell chips).
> 3) or VPP :-)
> 
> Requiring a linux bridge, and requiring a kernel interface, prohibits
> non-linux eVPN scenarios. May I suggest that these things are kept optional
> even if they are the default, but that they can be turned off, for example
> by configuring a dummy interface dummy0, setting a config toggle 'nowait' to
> skip waiting for it to be oper-up/enslaved, and that we also do not require
> 'bridge' protocol ?

Yeah, that is probably how it will be. Assume Linux networking model,
but do not depend on it.


> > Btw, i planned to explicitly configure bridge device for EVPN protocol
> > (as it is now implicitly through tunnel_dev->master). The idea is that as
> > VRF device (in Linux) defines L3 VRF, bridge device defines MAC-VRF. And
> > as L3 protocols are associated with specific L3 VRF, L2 protocols should
> > be associated with specific MAC-VRF.
> It would be good if 'evpn' protocol can continue to be used standalone, in
> particular not conflate with 'bridge'. In my view, one should be able to
> inspect evpntab and etab to construct other integrations without the need to
> consult kernel devices. At the moment, 'evpn' entirely so and less so
> 'oz-evpn' are elegant precisely because it does complete signalling and
> captures evpntab and etab using exclusively one 'evpn' and 'bgp' protocol
> together with the 'evpn table' and 'eth table'. It allows me to create a
> custom 'vppevpn' protocol that subscribes to those tables. See attached
> config file (bird-example.conf) for an idea of where I'm headed.

I agree and this split of work between 'evpn' protocol and 'bridge' protocol
(with separate 'evpn table' and 'eth table') are going to stay.


> I am happy to share the 'vppevpn' protocol with others also, as an example
> '3P integration'. I do not expect it to be upstreamed into Bird2, unless
> there are community requests for it.
> Ondrej, do let me know if you'd like to take a sneak peak at my code (it's
> in a private repo for now, as it's not ready for wider review yet, but it is
> mostly functional).

Having better integration with VPP (or some other userspace dataplane)
is something we are interested in general, but i would not look at it
before i finish some other tasks (including merging EVPN) as i am rather
overwhelmed.


> > > > > (3) Setting BGP Next Hop clears MPLS Labelstack, filters cannot set 
> > > > > this.
> > > > > When the BGP Next Hop is changed by an export filter, we lose the MPLS
> > > > > labelstack. There is no way to add MPLS labelstack in filters (at 
> > > > > least,
> > > > > that I could find), so we cannot use 'next hop address X' to 
> > > > > determine the
> > > > > Type-2 MAC VxLAN endpoint. Note: IMET updates do not use the BGP Next 
> > > > > Hop,
> > > > > but rather a PSMI attribute with the 'router address' already.
> > > > Resetting MPLS label when changing next hop is intentional, as MPLS 
> > > > labels are
> > > > (in general) specific to receiving routers.
> > > > 
> > > > There is gw_mpls (and undocumented/semantically broken gw_mpls_stack)
> > > > attribute that could be accessed in filters.
> > > > 
> > > > I am not sure what is your use case here to change it with filters, can
> > > > you describe it more? What about setting 'router address' in EVPN proto?
> > > With the oz-evpn branch as-is, setting 'router address' in evpn proto 
> > > will:
> > > 1) copy that to the PSMI attribute: good
> > > 2) not do anything for MAC announcements; they will have BGP.next_hop set 
> > > to
> > > the session address.
> > > 
> > > if the previous patch in (2) is accepted, then 'router address' will be 
> > > used
> > > as BGP.next_hop, which will avoid the need to change it with filters with
> > > (3).
> > Oh, i see. You are right, this should work automatically for both IMET / 
> > PMSI
> > and MAC.
> > 
> > I do not like using regular/immediate next hops here in EVPN table, as
> > it does not fit well semantically and requires formal device. But seems
> > to me that a reasonable alternative would be to just attach BGP_NEXT_HOP
> > by EVPN protocol, similarly how BGP_PMSI_TUNNEL is attached. Wil do that.
> > Any comments?
>
> If you were to attach a specific attribute like vxlan_nexthop or vxlan_vni
> to the etab table entry, I would simply read that and use it instead of the
> bgp nexthop. That's what happens already today for IMET, as it has the
> BGP.pmsi_tunnel attribute with the needed ingress-replication
> 2001:678:d78:200::2 mpls 10040 information. How do other vendors (say
> Arista, Cisco, Nokia, FRRouting) handle the Type-2 nexthop? My understanding
> is they use BGP next hop for that (in other words, the same as how Bird does
> it today).

I think there is some confusion here. I am talking about evpntab
entries, not about etab entries. And about your patch that sets router
IP into their immediate next hop (nh.gw).


> > Note that immediate next hops in EVPN table for routes received through
> > BGP are here just as an artefact of BGP_NEXT_HOP resolvability check,
> > they should not be here too.
>
> Not sure I understand what you mean - don't we have this problem also for
> kernel based vxlan? If we create a vxlan0 interface in a bridge, and set a
> fdb entry onto it, we also need to know which vxlan nexthop to use. The way
> I read 'evpn' and 'oz-evpn', we use the BGP nexthop for that purpose.
> However, if what you're saying is you'd want to remove the BGP Next Hop and
> instead have an EVPN VxLAN Next Hop attribute to populate the 'etab' gateway
> field that would work just as well for me. I kind of wonder why you'd go to
> the trouble obfuscating the BGP Next Hop. Don't other vendors use the same
> thing (send vxlan packet to the address learned via the BGP Next Hop in
> Type-2 announcements) ?

I just mean that immediate next hop fields for evpntab routes received
through BGP are irrelevant, while the BGP next hop attribute is the
important one. When 'evpn' protocol takes a route from evpntab and convert
it to etab entry, it examines BGP next hop, not immediate next hop.

bird> show route table evpntab all
Table evpntab:
evpn mac 1:22 2 76:a0:cf:05:dd:4f * unicast [ibgp1 00:06:40.215 from 10.1.2.1] 
* (100/20) [i]
    via 10.1.21.2 on ve1 mpls 200022
    Type: BGP univ
    BGP.origin: IGP
    BGP.as_path: 
    BGP.next_hop: 10.1.2.1
    BGP.local_pref: 100
    BGP.ext_community: (rt, 1, 0) (generic, 0x30c0000, 0x8)
    BGP.mpls_label_stack: 200022

Here immediate next hop is 10.1.21.2, while BGP next hop is 10.1.2.1 (two hops 
away)

If EVPN had been encapsulated in MPLS, then it would have made sense to
show the immediate next hop, but in case of VXLAN encapsulation, the
traffic is encapsulated and sent to BGP next hop.


> > While i agree that it should work automatically by just setting router
> > address in protocol evpn, i think that this setup that should work even
> > without patches:
> > 
> >   protocol evpn {
> >     ...
> >     encapsulation vxlan { router address 192.0.2.1; };
> >   }
> >   protocol bgp {
> >     evpn { import all; export all; next hop address 192.0.2.1; };
> >     local 2001:db8::1 as 65512;
> >     neighbor 2001:db8::2 as 65512;
> >   }
> I don't think this works for MAC, for IMET it works because that has a
> custom PSMI BGP attribute which is set to encap0->router_addr). Setting the
> next hop in this way will clear the mpls labelstack. So we'd end up with:
> fe:54:00:f0:11:02 vlan 100 mpls 0 unicast [evpn1 2026-02-19] * (80)
>         via 192.0.2.1 on vxlan0 mpls 0
> and we'd lose the VNI.

I think it will not clear the MPLS labelstack. This is not setting next
hop in filters. The difference between

  evpn { import all; export all; next hop address 192.0.2.1; };

and 

  evpn { import all; export all; };

in BGP protocol export is only where the BGP next hop value is taken
from (explicitly configured one or source address from BGP session), but
route processing is the same. See bgp_update_next_hop_ip(), the
!bgp_use_next_hop(s, a) and !bgp_use_gateway(s) case.


-- 
Elen sila lumenn' omentielvo

Ondrej 'Santiago' Zajicek (email: [email protected])
"To err is human -- to blame it on a computer is even more so."

Reply via email to