Hoi,

Thanks for taking a look, Marina and Ondrej, I appreciate it!

On 18.02.2026 17:50, Ondrej Zajicek wrote:
As others noted, the relevant branch is 'oz-evpn', the older 'evpn'
branch fell victim to my needlesly strict adherence to "do not rebase
public branch" rule. The patches in 'oz-evpn' are not only rebased on
newer BIRD version, but also have fixes squashed in them, and there is
newer development. I just pushed there rebase to 2.18. Please look at
this branch first. Also note there are some minor changes to EVPN protocol
configuration syntax.
I have ported by vppevpn protocol implementation to be based on oz-evpn, and the system is functional here also. Yaay!

I only had one small issue. In oz-evpn, the 'evpn' protocol will stay in 'startup' until the vxlan0 interface becomes ready. However, in my usecase, vxlan is not performed by the kernel, but by VPP, so there is no 'vxlan0' interface. I need only 'vni' and 'router address' (and the remote VTEP) to construct the dataplane configuration. To allow the evpn protocol to transition to PS_UP, I decided to fire an event that announces the IMET if router_addr and VNI are set, and skips waiting for the interface.

See inline -
On 14.02.2026 12:49, Pim van Pelt via Bird-users wrote:
I've started to toy with VPP and eVPN/VxLAN, and took a look at the evpn
branch from a few years ago.
For my network, I'll need the OSPFv3 'unnumbered' features we built, so
I thought I'd ask - would it be possible to rebase the evpn branch ?
I've taken a stab at it (see attached patch) by replaying the 9 commits
on top if HEAD (f1a7229d-evpn.diff).

It may not be correct, but it does compile and seemingly work 🙂
I have played around with this 2.18+evpn rebase and created a working
eVPN/VxLAN with VPP. I stumbled across a few specifics which I'd like to
share:

(1) The evpn export are causing the following assertion failure:
Assertion '!((a->flags ^ desc->flags) & (BAF_OPTIONAL | BAF_TRANSITIVE))'
failed at proto/bgp/attrs.c:1269

evpn_announce_mac() and evpn_announce_imet() were using ea_set_attr_ptr()
with flags=0 to set BGP attributes BA_EXT_COMMUNITY and BA_PMSI_TUNNEL.
Those attributes have descriptor flags BAF_OPTIONAL | BAF_TRANSITIVE, and
when BGP's bgp_export_attr() processes those attributes during update
encoding, it trips the assertion.

This patch switches to bgp_set_attr_ptr() which automatically normalizes
flags from the descriptor table, ensuring the stored attribute flags always
match what the descriptor expects. Compare to l3vpn.c which correctly passed
BAF_OPTIONAL | BAF_TRANSITIVE explicitly, this feels cleaner.
Already fixed in oz-evpn. I would prefer not to use bgp_set_attr() outside BGP
and we already have another approach to attribute handling in BIRD 3, so i kept
the ea_set_attr_ptr() functions here.


*See bird2.18+evpn_use_bgp_set_attr.diff for a possible fix.
*
(2) BGP Next Hop for Type-2 should be the 'router address' from evpn
protocol.
When announcing an IPv4 vxlan evpn on an IPv6 BGP session, default behavior
is to set the next hop using the BGP session. This means the MAC nexthops
will be IPv6, not 'router address'. More-over, changing this with 'next hop
address X' is not possible, because overriding the next-hop will remove the
MPLS label (which carries the VNI).

Under the assumption that whatever 'router address' is in the evpn protocol
context will determine:
1) the PMSI [already correctly added even if the nexthop is a different
family, here it does not matter]
2) the BGP next hops for Type-2 (MAC) announcements [where it matters if the
evpn vxlan address family differs to the BGP session address family]

This patch fixes the latter: setting the BGP next hop to the 'router
address' field for evpn_announce_mac() and for consistency also for
evpn_announce_imet()
*See bird2.18+evpn_use_routeraddr_as_bgp_nexthop.diff for a reasonable
default.
Will look at this more.


(3) Setting BGP Next Hop clears MPLS Labelstack, filters cannot set this.
When the BGP Next Hop is changed by an export filter, we lose the MPLS
labelstack. There is no way to add MPLS labelstack in filters (at least,
that I could find), so we cannot use 'next hop address X' to determine the
Type-2 MAC VxLAN endpoint. Note: IMET updates do not use the BGP Next Hop,
but rather a PSMI attribute with the 'router address' already.
Resetting MPLS label when changing next hop is intentional, as MPLS labels are
(in general) specific to receiving routers.

There is gw_mpls (and undocumented/semantically broken gw_mpls_stack)
attribute that could be accessed in filters.

I am not sure what is your use case here to change it with filters, can
you describe it more? What about setting 'router address' in EVPN proto?
With the oz-evpn branch as-is, setting 'router address' in evpn proto will:
1) copy that to the PSMI attribute: good
2) not do anything for MAC announcements; they will have BGP.next_hop set to the session address.

if the previous patch in (2) is accepted, then 'router address' will be used as BGP.next_hop, which will avoid the need to change it with filters with (3).
If neither patch is applied, the following config:

protocol evpn {
  ...
  encapsulation vxlan { router address 192.0.2.1; };
}
protocol bgp {
  evpn { import all; export all; };
  local 2001:db8::1 as 65512;
  neighbor 2001:db8::2 as 65512;
}

will yield IMET pointing at 192.0.2.1 but MAC pointing at 2001:db8::1. If I want MAC pointing at 192.0.2.1 also, I would either need (2, my preference) or a filter with (3). If there exists a device out there which has different addressing for IMET and MAC (note: I don't know of any, but perhaps they exist), then (3) would come in handy.

For completeness, here's a small diff of the changes I made (a) allow vxlan interface to be omitted from kernel and (b) nexthop defaults to 'router address' and (c) allow to override bgp next hop in filter. Something like (a) is required for my usecase, and one-of ((b) or (c)).

groet,
Pim

--
Pim van Pelt<[email protected]>
PBVP1-RIPEhttps://ipng.ch/

diff --git a/proto/bgp/packets.c b/proto/bgp/packets.c
index 694d2bf5..4e3bc822 100644
--- a/proto/bgp/packets.c
+++ b/proto/bgp/packets.c
@@ -1258,13 +1258,17 @@ bgp_use_gateway(struct bgp_export_state *s)
     return 0;
 
   /* Check for non-matching AF */
-  if ((ipa_is_ip4(ra->nh.gw) != bgp_channel_is_ipv4(c)) && !c->ext_next_hop)
+  if ((ipa_is_ip4(ra->nh.gw) != bgp_channel_is_ipv4(c)) && !c->ext_next_hop && 
!bgp_channel_is_l2vpn(c))
     return 0;
 
   /* Do not use gateway from different VRF */
   if (p->p.vrf_set && ra->nh.iface && !if_in_vrf(ra->nh.iface, p->p.vrf))
     return 0;
 
+  /* For L2VPN (EVPN), always use the gateway as the VTEP next hop */
+  if (bgp_channel_is_l2vpn(c))
+    return 1;
+
   /* Use it when exported to internal peers */
   if (p->is_interior)
     return 1;
@@ -1310,6 +1314,19 @@ bgp_update_next_hop_ip(struct bgp_export_state *s, eattr 
*a, ea_list **to)
     }
   }
 
+  /* For L2VPN (EVPN): ensure MPLS label stack is set even if next hop was 
filter-overridden */
+  if (s->mpls && bgp_channel_is_l2vpn(s->channel) && !bgp_find_attr(*to, 
BA_MPLS_LABEL_STACK))
+  {
+    rta *ra = s->route->attrs;
+    if (ra->nh.labels)
+      bgp_set_attr_data(to, s->pool, BA_MPLS_LABEL_STACK, 0, ra->nh.label, 
ra->nh.labels * 4);
+    else
+    {
+      u32 label = ea_get_int(ra->eattrs, EA_MPLS_LABEL, BGP_MPLS_NULL);
+      bgp_set_attr_data(to, s->pool, BA_MPLS_LABEL_STACK, 0, &label, 4);
+    }
+  }
+
   /* Check if next hop is valid */
   a = bgp_find_attr(*to, BA_NEXT_HOP);
   if (!a)
diff --git a/proto/evpn/evpn.c b/proto/evpn/evpn.c
index 651a07fb..fdc915ea 100644
--- a/proto/evpn/evpn.c
+++ b/proto/evpn/evpn.c
@@ -73,6 +73,7 @@ static inline const struct adata * ea_get_adata(ea_list *e, 
uint id)
 
 static struct evpn_vlan *evpn_find_vlan_by_tag(struct evpn_proto *p, u32 tag);
 static struct evpn_vlan *evpn_find_vlan_by_vid(struct evpn_proto *p, u32 vid);
+static void evpn_no_iface_startup(void *data);
 
 #define EVPN_ROOT_VLAN(P) \
   ( &(struct evpn_vlan){ .tag = (P)->tagX, .vni = (P)->vni, .vid = (P)->vid } )
@@ -178,14 +179,21 @@ evpn_announce_mac(struct evpn_proto *p, const 
net_addr_eth *n0, rte *new)
   net_addr *n = alloca(sizeof(net_addr_evpn_mac));
   net_fill_evpn_mac(n, p->rd, v->tag, n0->mac);
 
+  struct evpn_encap *encap = SKIP_BACK(struct evpn_encap, n, HEAD(p->encaps));
+
   if (new)
   {
     rta *a = alloca(RTA_MAX_SIZE);
     *a = (rta) {
       .source = RTS_EVPN,
       .scope = SCOPE_UNIVERSE,
+      .dest = ipa_nonzero(encap->router_addr) ? RTD_UNICAST : RTD_UNREACHABLE,
       .pref = c->preference,
+      .nh.gw = encap->router_addr,
+      .nh.iface = encap->tunnel_dev,
+      .nh.labels = 1,
     };
+    a->nh.label[0] = v->vni;
 
     struct adata *ec = evpn_encap_ext_comms(p);
     struct adata *ad = evpn_export_targets(p, ec);
@@ -232,7 +240,10 @@ evpn_announce_imet(struct evpn_proto *p, struct evpn_vlan 
*v, int new)
     *a = (rta) {
       .source = RTS_EVPN,
       .scope = SCOPE_UNIVERSE,
+      .dest = ipa_nonzero(encap->router_addr) ? RTD_UNICAST : RTD_UNREACHABLE,
       .pref = c->preference,
+      .nh.gw = encap->router_addr,
+      .nh.iface = encap->tunnel_dev,
     };
 
     struct adata *ec = evpn_encap_ext_comms(p);
@@ -1059,11 +1070,37 @@ evpn_start(struct proto *P)
     P->mpls_map->vrf_iface = P->vrf;
   */
 
+  /* If router address and VNI are fully configured, no need to wait for
+   * the tunnel device to come up (e.g., when VPP manages VXLAN tunnels).
+   * Schedule an immediate event to transition to PS_UP. */
+  struct evpn_encap *encap0 = evpn_get_encap(p);
+  if (!ipa_zero(encap0->router_addr) && (p->vni != U32_UNDEF))
+  {
+    event *e = ev_new_init(p->p.pool, evpn_no_iface_startup, p);
+    ev_schedule(e);
+  }
+
   /* Wait for VXLAN interfaces to be up */
 
   return PS_START;
 }
 
+static void
+evpn_no_iface_startup(void *data)
+{
+  struct evpn_proto *p = data;
+
+  if (p->p.proto_state != PS_START)
+    return;
+
+  proto_notify_state(&p->p, PS_UP);
+
+  evpn_announce_imet(p, EVPN_ROOT_VLAN(p), 1);
+
+  WALK_LIST_(struct evpn_vlan, v, p->vlans)
+    evpn_announce_imet(p, v, 1);
+}
+
 static void
 evpn_started(struct evpn_proto *p, struct iface *i)
 {

Reply via email to