Hi Gyan,

Thanks for the comments. I will follow the recommendation w.r.t MCLAG.
About the problem statement, the solution that you are describing is not what 
this draft is about.
There is actually no BGP session between host and DAG(IP). That problem is 
solved in a different draft.
Here, connected hosts are simple appliance NOT running BGP where they are 
connected via L2/Ethernet to FHR (first hop router).
Appliance can be a CE, a switch, etc. The connected interfaces are L3 instead 
of usual L2 interface.
The draft explains how to sync ARP/ND table and few more things.
Regards,
Patrice Brissette
Distinguished Engineer
Cisco Systems




From: Gyan Mishra <[email protected]>
Date: Saturday, September 6, 2025 at 23:35
To: Patrice Brissette (pbrisset) <[email protected]>
Cc: [email protected] <[email protected]>, [email protected] <[email protected]>
Subject: Re: [bess] draft-mackenzie-bess-evpn-l3mh-proto
Hi Patrice& authors

Excellent work on coming up with this solution for L3 over L2 MC-LAG.  I am 
curious about the use cases and problem this solution solves.

When I think of MC-LAG I think of proprietary legacy implementations of multi 
chassis LAG such as Cisco vPC or Juniper MC-LAG  where in contrast modern LAG 
using EVPN fabric for ARP/ND synchronization which does not require ICCP or 
proprietary link between the leafs for synchronization.

My recommendation would be to not mention MC-LAG by itself and call it ESI 
MC-LAG which is the modern EVPN fabric based LAG used with MPLS or VXLAN 
fabrics.

I have some questions regarding 1.1 in the problem statement.  My understanding 
AFAIK  that BGP over ESI LAG is very common in modern VXLAN or MPLS fabric 
based DC where the host is eBGP peered to the RFC 9135 Inter subnet forwarding 
Distributed Anycast Gateway (DAG) IP single session hashed to DF leaf and 
synchronized with NDF leaf via EVPN fabric.

Here is how it works.

Below describes how an all-active multihomed host interacts with an Ethernet 
VPN (EVPN) fabric using an anycast gateway and BGP
. The mechanism ensures redundancy, seamless failover, and load balancing for 
both L2 and L3 traffic.
Here is a breakdown of the process explained in the user's text:
1. Anycast gateway and host peering
·         Anycast IP: A host is typically connected to two or more leaf 
switches via a Link Aggregation Group (LAG). All leaf switches connected to the 
same Ethernet Segment Identifier (ESI) share the same IP and MAC address, 
called the anycast gateway.
·
o    eBGP peering: The multihomed host establishes an External BGP (eBGP) 
peering session with the anycast gateway IP address. Since the IP is the same 
on both leaf switches, the host sees a single gateway.
o    Designated Forwarder (DF) election: EVPN uses a DF election algorithm to 
determine which leaf switch is the DF for a specific Ethernet segment. The DF 
is responsible for forwarding Broadcast, Unknown-unicast, and Multicast (BUM) 
traffic to the host. The other leaf is the non-DF (NDF).
o    BGP session via DF: The host's eBGP session will be established over the 
LAG member connected to the DF leaf. This is because the DF holds the active 
ARP/ND entry for the host.
2. Seamless failover
·         ARP/ND synchronization: EVPN synchronizes the host's ARP (for IPv4) 
and ND (for IPv6) information across the fabric using EVPN Type-2 routes 
(MAC/IP Advertisement routes). This means the NDF leaf is also aware of the 
host's IP and MAC address.
·         Fabric notification and failover: If the DF leaf switch fails, the 
eBGP session drops. The NDF leaf, having already been synchronized with the 
host's reachability information, takes over as the new DF. This provides a 
seamless failover, as the host's BGP peering is quickly re-established with the 
new DF leaf using the same anycast gateway IP address.
3. Traffic flow management
·         Load balancing for host-advertised subnets:When the multihomed host 
advertises subnets via BGP into the EVPN fabric, the fabric sees the routes 
originating from both the DF and NDF leaf switches (with the same ESI). This 
allows the fabric to use Equal-Cost Multipath (ECMP) routing to load balance 
incoming traffic flows across both all-active links.
·         EVPN procedures for loop prevention:
o    Split horizon: This mechanism prevents a BUM packet from being forwarded 
back to the multihomed host it originated from. For VXLAN, this is typically 
done using the source IP address of the VTEP (the leaf switch) in the tunnel 
header to prevent the packet from looping back.
o    Local bias: With local bias, when a leaf switch receives BUM traffic from 
a remote VTEP that is also part of a shared Ethernet segment, it will not 
forward that traffic out of its local port for that segment. This is the main 
VXLAN-based mechanism for split horizon filtering.
o    Backup path aliasing (anycast aliasing): This is an optimization that 
helps remote leaf switches load balance traffic toward a multihomed site. It 
allows load balancing across all leaf switches attached to the same ESI, 
ensuring efficient use of all paths.
Thanks

Gyan


On Fri, Sep 5, 2025 at 4:55 PM Patrice Brissette (pbrisset) 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

We believe this draft is ready for WG adoption.
How can we move it forward?

Draft is here: 
https://datatracker.ietf.org/doc/draft-mackenzie-bess-evpn-l3mh-proto/

Regards,
Patrice Brissette
Distinguished Engineer
Cisco Systems



_______________________________________________
BESS mailing list -- [email protected]<mailto:[email protected]>
To unsubscribe send an email to [email protected]<mailto:[email protected]>
_______________________________________________
BESS mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to