16      Abstract

18         The Multi-Chassis Link Aggregation Group (MC-LAG) technology enables
19         establishing a logical link-aggregation connection with a redundant
20         group of independent nodes.  The purpose of multi-chassis LAG is to
21         provide a solution to achieve higher network availability, while

[nit] please remove the comma after "availability" 

22         providing different modes of sharing/balancing of traffic.  RFC7432
23         defines EVPN based MC-LAG with single-active and all-active

[nit] s/EVPN based/EVPN-based

24         multi-homing load-balancing mode.  The current draft expands on

24         multi-homing load-balancing mode.  The current draft expands on

25         existing redundancy mechanisms supported by EVPN and introduces
26         support for port-active load-balancing mode.

85      1.  Introduction

87         EVPN, as per [RFC7432], provides all-active per flow load-balancing

[nit] s/per flow/per-flow/g

88         for multi-homing.  It also defines single-active with service carving
89         mode, where one of the PEs, in redundancy relationship, is active per

[nit] s/in redundancy/in a redundancy

90         service.

92         While these two multi-homing scenarios are most widely utilized in

92         While these two multi-homing scenarios are most widely utilized in

... two multi-homing scenarios (speficied in [RFC7432) are ...

93         data center and service provider access networks, there are scenarios
94         where active-standby per interface multi-homing load-balancing is
95         useful and required.  The main consideration for this mode of

[minor] Suggestion:

... for this new mode of ...

96         load-balancing is the determinism of traffic forwarding through a
97         specific interface rather than statistical per flow load-balancing
98         across multiple PEs providing multi-homing.  The determinism provided
99         by active-standby per interface is also required for certain QOS

[minor] Suggestion:

... provided by this per-interface active-standby mode is also ...

[nit] s/per interface/per-interface/g

100        features to work.  While using this mode, customers also expect
101        minimized convergence during failures.

[major] The terms "active-standby per-interface", "per-interface 
and "port-active" are used through the document interchangeably. 
Is it possible to converge on one term that is used consistently? Perhaps
define the term in this Sec 1 and then use just "port-active" through the rest
of the document maybe?

[minor] "minimized" sounds a bit odd. Did you mean "fast convergence" perhaps?

103        A new type of load-balancing mode, port-active load-balancing, is
104        defined.  This draft describes how the new load-balancing mode can be
105        supported via EVPN.  The new mode may also be referred to as per
106        interface active/standby.

[minor] Text seems a bit fragmented. Suggestion:

This document defines a new type of multi-homing mode called port-active 
load-balancing, and describes how this new mode can be supported via EVPN.

[major] The new mode does provide multi-homing, but I am not sure that it 
provides load-balancing of traffic in the true sense. 
Can you please clarify what is meant by load-balancing?

108                         +-----+
109                         | PE3 |
110                         +-----+
111                      +-----------+
112                      |  MPLS/IP  |
113                      |  CORE     |
114                      +-----------+
115                    +-----+   +-----+
116                    | PE1 |   | PE2 |
117                    +-----+   +-----+
118                       |         |
119                       I1       I2
120                         \     /
121                          \   /
122                          +---+
123                          |CE1|
124                          +---+

126                              Figure 1: MC-LAG Topology

128        Figure 1 shows a MC-LAG multi-homing topology where PE1 and PE2 are

[nit] s/a MC-LAG/an MC-LAG

129        part of the same redundancy group providing multi-homing to CE1 via
130        interfaces I1 and I2.  Interfaces I1 and I2 are members of a LAG
131        running LACP protocol.  The core, shown as IP or MPLS enabled,
132        provides wide range of L2 and L3 services.  MC-LAG multi-homing

[nit] s/provides wide/provides a wide

133        functionality is decoupled from those services in the core and it
134        focuses on providing multi-homing to the CE.  With per-port active/
135        standby load-balancing, only one of the two interface I1 or I2 would

[nit] s/two interface/two interfaces

136        be in forwarding, the other interface will be in standby.  This also

[nit] s/forwarding, the/forwarding and the

137        implies that all services on the active interface are in active mode
138        and all services on the standby interface operate in standby mode.

140     1.1.  Requirements Language

142        The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
144        "OPTIONAL" in this document are to be interpreted as described in BCP
145        14 [RFC2119] [RFC8174] when, and only when, they appear in all
146        capitals, as shown here.

148     2.  Multi-Chassis Link Aggregation

150        When a CE is multi-homed to a set of PE nodes using the

150        When a CE is multi-homed to a set of PE nodes using the
151        [IEEE.802.1AX_2014] Link Aggregation Control Protocol (LACP), the PEs
152        must act as if they were a single LACP speaker for the Ethernet links
153        to form and operate as a Link Aggregation Group (LAG).  To achieve
154        this, the PEs connected to the same multi-homed CE must synchronize
155        LACP configuration and operational data among them.  Interchassis
156        Communication Protocol (ICCP) [RFC7275] has been used for that
157        purpose.  EVPN LAG simplifies greatly that solution.  Along with the
158        simplification come a few assumptions:

[major] Are these assumptions or requirements/constraints? Please consider
using normative language for such operational requirements as done in Sec 3.

160        *  a CE device connected to multi-homing PEs may have a single LAG
161           with all its active links i.e. links in the LAG operate in all-
162           active load-balancing mode.

[major] Why "may have"? Is it not a requirement that the CE considers all
links to both the PEs as active and it is the PEs who would set the link
down/out-of-sync on their side based on the EVPN signaling?

164        *  Same LACP parameters MUST be configured on peering PEs such as
165           system id, port priority and port key.

[nit] s/priority and/priority, and

167        Any discrepancies from this list are out of the scope of this

167        Any discrepancies from this list are out of the scope of this
of scope, right? The handling of mis-configurations/mis-wiring can be out of

168        document, as are mis-configuration and mis-wiring detection across

[nit] misconfiguration & miswiring

169        peering PEs.

171     3.  Port-active Load-balancing Procedure

173        Following steps describe the proposed procedure with EVPN LAG to

[nit] The following

174        support port-active load-balancing mode:

176        a.  The Ethernet-Segment Identifier (ESI) MUST be assigned per access
177            interface as described in [RFC7432], which may be auto derived or

[nit] auto-derived

178            manually assigned.  Access interface MAY be a Layer-2 or Layer-3

[nit] The access

179            interface.  The usage of ESI over Layer-3 interface is newly

[nit] over a Layer-3

180            described in this document.

182        b.  Ethernet-Segment (ES) MUST be configured in port-active
183            load-balancing mode on peering PEs for specific access interface.

185        c.  Peering PEs MAY exchange only Ethernet-Segment (ES) route
186            (Route Type-4) when ESI is configured on a Layer-3 interface.

188        d.  PEs in the redundancy group leverage the DF election defined in
189            [RFC8584] to determine which PE keeps the port in active mode and
190            which one(s) keep it in standby mode.  While the DF election

[nit] one keeps

191            defined in [RFC8584] is per [ES, Ethernet Tag] granularity, for
192            port-active mode of multi-homing, the DF election is done per

[nit] the port-active

193            <ES>.  The details of this algorithm are described in Section 4.

195        e.  DF router MUST keep corresponding access interface in up and
196            forwarding active state for that Ethernet-Segment

198        f.  Non-DF routers will by default implement a bidirectional blocking
199            scheme for all traffic in line with [RFC7432] Single-Active
200            blocking scheme, albeit across all VLANS.

[nit] VLANs

202            *  Non-DF routers MAY bring and keep peering access interface
203               attached to it in operational down state.

[nit] an operational

205            *  If the interface is running LACP protocol, then the non-DF PE
206               MAY also set the LACP state to OOS (Out of Sync) as opposed to
207               interface state down.  This allows for better convergence on

[nit] an interface down state

208               standby to active transition.

210        g.  For EVPN-VPWS service, the usage of primary/backup bits of EVPN
211            Layer-2 attributes extended community [RFC8214] is highly
212            recommended to achieve better convergence.

214     4.  Designated Forwarder Algorithm to Elect per Port-active PE

216        The ES routes, running in port-active load-balancing mode, are
217        advertised with the new Port Mode Load-Balancing capability in the DF
218        Election Extended Community defined in [RFC8584].  Moreover, the ES
219        associated to the port leverages existing procedure of Single-Active,

[nit] associated with
[nit] leverages the existing

220        and signals Single-Active Multihomed site redundancy mode along with
221        Ethernet-AD per-ES route (Section 7.5 of [RFC7432]).  Finally the
222        ESI-label based split-horizon procedures in Section 8.3 of [RFC7432]

[nit] ESI label-based

223        should be used to avoid transient echo'ed packets when Layer-2
224        circuits are involved.

226        The various algorithms for DF Election are discussed in Sections 4.2
227        to 4.5 for completeness, although the choice of algorithm in this

[nit] completeness eventhough the choice of the algorithm

228        solution doesn't affect complexity or performance as in other load-
229        balancing modes.

231     4.1.  Capability Flag

233        [RFC8584] defines a DF Election extended community, and a Bitmap
234        field to encode "capabilities" to use with the DF election algorithm
235        in the DF algorithm field.  Bitmap (2 octets) is extended by the
236        following value:

[major] The extension is only the P bit. The text gives a wrong impression
that the D and AC-DF bits are also being extended by this document. Please
consider changing this text to clarify that D and AC-DF bit are existing
bits that are also used by this mode.

238                                 1 1 1 1 1 1
239             0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
240            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
241            |D|A|     |P|                   |
242            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

244         Figure 2: Amended Bitmap field in the DF Election Extended Community

246        Bit 0:    D bit or 'Don't Preempt' bit, as explained in
247                  [I-D.ietf-bess-evpn-pref-df].

249        Bit 1:    AC-DF Capability (AC-Influenced DF election), as explained
250                  in [RFC8584].

252        Bit 5:    (corresponds to Bit 29 of the DF Election Extended
253                  Community and it is defined by this document): 'Port Mode

[minor] Suggest to remove this "Bit 29" - I don't see similar counting of bits
within the entire ExtComm being done anywhere. The "Bit 5" of the field is
clear enough.

254                  Load-Balancing' Capability (P bit hereafter), determines

[nit] the use of quote seems odd

255                  that the DF-Algorithm should be modified to consider the
256                  port ES only and not the Ethernet Tags.

[major] Seems odd to call this "port mode load-balancing" when there is no
load-balancing? Wouldn't "port active mode multihoming" be more accurate?

258     4.2.  Modulo-based Algorithm

260        The default DF Election algorithm, or modulus-based algorithm as in
261        [RFC7432] and updated by [RFC8584], is used here, at the granularity
262        of ES only.  Given that ES-Import Route Target extended community may
263        be auto-derived and directly inherits its auto-derived value from ESI
264        bytes 1-6, many operators differentiate ESI primarily within these
265        bytes.  As a result, bytes 3-6 are used to determine the designated
266        forwarder using Modulo-based DF assignment, achieving good entropy
267        during Modulo calculation across ESIs:
268        Assuming a redundancy group of N PE nodes, the PE with ordinal i is
269        the DF for an <EE> when (Es mod N) = i, where Es represents bytes 3-6
270        of that ESI.

272     4.3.  HRW Algorithm

274        Highest Random Weight (HRW) algorithm defined in [RFC8584] MAY also
275        be used and signaled, and modified to operate at the granularity of
276        <ES> rather than per <ES, VLAN>.

278        Section 3.2 of [RFC8584] describes computing a 32 bit CRC over the

[nit] 32-bit

279        concatenation of Ethernet Tag and ESI.  For port-active
280        load-balancing mode, the Ethernet Tag is simply removed from the CRC
281        computation.

283        DF(Es) denotes the DF and BDF(Es) denote the BDF for the ESI es; Si
284        is the IP address of PE i; and Weight is a function of Si, and Es.

286        1.  DF(Es) = Si| Weight(Es, Si) >= Weight(Es, Sj), for all j.  In the
287            case of a tie, choose the PE whose IP address is numerically the
288            least.  Note that 0 <= i,j < number of PEs in the redundancy
289            group.

291        2.  BDF(Es) = Sk| Weight(Es, Si) >= Weight(Es, Sk), and Weight(Es,
292            Sk) >= Weight(Es, Sj).  In the case of a tie, choose the PE whose
293            IP address is numerically the least.

295        Where:

297        *  DF(Es) is defined to be the address Si (index i) for which
298           Weight(Es, Si) is the highest; 0 <= i < N-1.

300        *  BDF(Es) is defined as that PE with address Sk for which the
301           computed Weight is the next highest after the Weight of the DF.  j
302           is the running index from 0 to N-1; i and k are selected values.

304     4.4.  Preference-based DF Election

306        When the new capability 'Port-Mode' is signaled, the algorithm is
307        modified to consider the port only and not any associated Ethernet
308        Tags.  Furthermore, the "port-based" capability MUST be compatible
309        with the "Don't Preempt" bit.  When an interface recovers, a peering
310        PE signaling D-bit will enable non-revertive behaviour at the port

310        level.

311        level.

313     4.5.  AC-Influenced DF Election

315        The AC-DF bit MUST be set to 0 when advertising Port Mode Load-
316        Balancing capability (P=1).  When an AC (sub-interface) goes down, it
317        does not influence the DF election.  The peer's Ethernet A-D per EVI
318        is ignored in all Port Mode DF Election algorthms.

[nit] algorithms

320        Upon receiving AC-DF bit set (A=1) from a remote PE, it MUST be

[nit] the AC-DF bit set

321        ignored when performing Port-Mode DF Election.

323     5.  Convergence considerations

325        To improve the convergence, upon failure and recovery, when

[nit] when the

326        port-active load-balancing mode is used, some advanced
327        synchronization between peering PEs may be required.  Port-active is
328        challenging in a sense that the "standby" port is in down state.  It

[nit] in the sense
[nit] in a down

329        takes some time to bring a "standby" port in up-state and settle the

[nit] port to an up state

330        network.  For IRB and L3 services, ARP / ND cache may be
331        synchronized.  Moreover, associated VRF tables may also be
332        synchronized.  For L2 services, MAC table synchronization may be
333        considered.

335        Finally, for members of a LAG running LACP the ability to set the
336        "standby" port in "out-of-sync" state a.k.a "warm-standby" can be
337        leveraged.

339     5.1.  Primary / Backup per Ethernet-Segment

341        The EVPN Layer 2 Attributes Control Flags extended community SHOULD
342        be advertised in Ethernet A-D per ES route for fast convergence.

344        Only the P and B bits are relevant to this document, and only in the
345        context of Ethernet A-D per ES routes:

[minor] Please consider providing references for the ExtComm and the bits on
their first use.

347        *  When advertised, the EVPN Layer 2 Attributes Control Flags
348           extended community SHALL have only P or B bits set and all other
349           bits and fields MUST be zero.

351        *  A remote PE receiving the optional EVPN Layer 2 Attributes Control
352           Flags extended community in Ethernet A-D per ES routes SHALL
353           consider only P and B bits.

[minor] In other words, the other bits are ignored and this is not considered
an error/malformed, right?

355        For EVPN Layer 2 Attributes Control Flags extended community sent and
356        received in Ethernet A-D per EVI routes used in [RFC8214], [RFC7432]
357        and [I-D.ietf-bess-evpn-vpws-fxc]:

359        *  P and B bits received are overridden by "parent" bits on Ethernet
360           A-D per ES above.

362        *  Other fields and bits of the extended community are used according
363           to the procedures of those documents.

365     5.2.  Backward Compatibility

367        Implementations that comply with [RFC7432] or [RFC8214] only (i.e.,
368        implementations that predate this document) will not advertise the

[nit] predate this specification

369        EVPN Layer 2 Attributes Control Flags extended community in Ethernet
370        A-D per ES routes.  That means that all remote PEs in the ES will not
371        receive P and B bit per ES and will continue to receive and honour

372        the P and B bits received in Ethernet A-D per EVI route(s).
[nit] honor

372        the P and B bits received in Ethernet A-D per EVI route(s).
373        Similarly, an implementation that complies with [RFC7432] or
374        [RFC8214] only and that receives an EVPN Layer 2 Attributes Control
375        Flags extended community will ignore it and will continue to use the
376        default path resolution algorithm.

378     6.  Applicability
describe here in brief the multi-homing/load-balancing mode that would result
with some reference pointers.

378     6.  Applicability

[minor] Suggestion: Consider rolling in the first half of this section into
the section 1 to give a better context to the reader and the 2nd
half in section 2.

380        A common deployment is to provide L2 or L3 service on the PEs
381        providing multi-homing.  The services could be any L2 EVPN such as
382        EVPN VPWS, EVPN [RFC7432], etc.  L3 service could be in VPN context

[nit] a VPN

383        [RFC4364] or in global routing context.  When a PE provides first hop

[nit] in a global

384        routing, EVPN IRB could also be deployed on the PEs.  The mechanism
385        defined in this document is used between the PEs providing L2 and/or
386        L3 services, when per interface single-active load-balancing is
387        desired.

389        A possible alternate solution is the one described in this draft is
390        MC-LAG with ICCP [RFC7275] active-standby redundancy.  However, ICCP
391        requires LDP to be enabled as a transport of ICCP messages.  There
392        are many scenarios where LDP is not required e.g. deployments with
393        VXLAN or SRv6.  The solution defined in this draft with EVPN does not
394        mandate the need to use LDP or ICCP and is independent of the
395        underlay encapsulation.

397     7.  Overall Advantages

399        The use of port-active multi-homing brings the following benefits to
400        EVPN networks:
context on the benefits/reason for introduction of this mode.

399        The use of port-active multi-homing brings the following benefits to
400        EVPN networks:

402        a.  Open standards based per interface single-active load-balancing

[nit] standards-based

403            mechanism that eliminates the need to run ICCP and LDP (e.g. they

[nit] e.g.,

404            may be running VXLAN or SRv6 in the network).

406        b.  Agnostic of underlay technology (MPLS, VXLAN, SRv6) and
407            associated services (L2, L3, Bridging, E-LINE, etc).

409        c.  Provides a way to enable deterministic QOS over MC-LAG attachment
410            circuits.

412        d.  Fully compliant with [RFC7432], does not require any new protocol
413            enhancement to existing EVPN RFCs.

415        e.  Can leverage various DF election algorithms e.g. modulo, HRW,
416            etc.

418        f.  Replaces legacy MC-LAG ICCP-based solution, and offers following

[nit] the following

419            additional benefits:

421            *  Efficiently supports 1+N redundancy mode (with EVPN using BGP
422               RR) where as ICCP requires full mesh of LDP sessions among PEs
423               in redundancy group.

[nit] whereas
[nit] requires a full
[nit] in the redundancy

425            *  Fast convergence with mass-withdraw is possible with EVPN, no
426               equivalent in ICCP.

428     8.  IANA Considerations

430        This document solicits the allocation of the following values:

430        This document solicits the allocation of the following values:
registry group

432        *  Bit 5 in the [RFC8584] DF Election Capabilities registry, with
433           name "P" for Port Mode Load-Balancing.

433           name "P" for Port Mode Load-Balancing.

435     9.  Security Considerations

437        The same Security Considerations described in [RFC7432] and [RFC8584]
438        are valid for this document.

440        By introducing a new capability, a new requirement for unanimity (or
441        lack thereof) between PEs is added.  Without consensus on the new DF
442        election procedures and Port Mode, the DF election algorithm falls
443        back to the default DF election as provided in [RFC8584] and
444        [RFC7432].  This behavior could be exploited by an attacker that
445        manages to modify the configuration of one PE in the ES so that the
446        DF election algorithm and capabilities in all the PEs in the ES fall
447        back to the default DF election.  If that is the case, the PEs will
448        be exposed to the same unfair load balancing, service disruption, and
449        possibly black-holing or duplicate traffic mentioned in those
450        documents and their security sections.

452     10.  Acknowledgements
not do more harm by making the configs on the dual-home PEs to be not
consistent? Without detection mechanism, the service impact may be far greater
in this case?

452     10.  Acknowledgements

509     12.2.  Informative References

511        [I-D.ietf-bess-evpn-vpws-fxc]
512                   Sajassi, A., Brissette, P., Uttaro, J., Drake, J.,
513                   Boutros, S., and J. Rabadan, "EVPN VPWS Flexible Cross-
514                   Connect Service", Work in Progress, Internet-Draft, draft-
515                   ietf-bess-evpn-vpws-fxc-05, 8 February 2022,
516                   <
517                   vpws-fxc-05.txt>.

519        [IEEE.802.1AX_2014]
520                   IEEE, "IEEE Standard for Local and metropolitan area
521                   networks -- Link Aggregation", IEEE 802.1AX-2014,
522                   DOI 10.1109/IEEESTD.2014.7055197, 24 December 2014,
523                   <
524                   opac?punumber=6997981>.

[major] Should the reference to MC-LAG not be normative since the document
talks about setting port in "out-of-sync" state?

526        [RFC4364]  Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
527                   Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February
528                   2006, <>.

