Authors As is customary, please find below my document shepherd review of this draft.
The comments are mainly of an editorial nature or suggest improvements to aid readability. Please treat these comments (prepended with MB>) as you would any other working group last call comments. Best regards Matthew === Fast Recovery for EVPN Designated Forwarder Election draft-ietf-bess-evpn-fast-df-recovery-05 Abstract Ethernet Virtual Private Network (EVPN) solution provides Designated MB> /Ethernet/The Ethernet Forwarder election procedures for multihomed Ethernet Segments. These procedures have been enhanced further by applying Highest Random Weight (HRW) Algorithm for Designated Forwarded election in order to avoid unnecessary DF status changes upon a failure. This draft improves these procedures by providing a fast Designated MB> /draft/document Forwarder (DF) election upon recovery of the failed link or node associated with the multihomed Ethernet Segment. The solution is independent of number of EVIs associated with that Ethernet Segment MB> /of number/of the number and it is performed via a simple signaling between the recovered PE and each of the other PEs in the multihoming group. [...] 1. Introduction Ethernet Virtual Private Network (EVPN) solution [RFC7432] is MB> /Ethernet/The Ethernet becoming pervasive in data center (DC) applications for Network Virtualization Overlay (NVO) and DC interconnect (DCI) services, and in service provider (SP) applications for next generation virtual private LAN services. [...] The EVPN specification [RFC7432] describes DF election procedures for MB> I think you just need to say [RFC7432] describes... multihomed Ethernet Segments. These procedures are enhanced further in [RFC8584] by applying Highest Random Weight Algorithm for DF election in order to avoid DF status change unnecessarily upon a link or node failure associated with the multihomed Ethernet Segment. MB> I found the above hard to parse. Maybe replace it with: "These procedures are enhanced further in [RFC8584] by applying Highest Random Weight Algorithm for DF election in order to avoid unnecessary DF status changes upon a link or node failure associated with the multihomed Ethernet Segment." [...] 1.1. Terminology Provider Edge (PE): A device that sits in the boundary of Provider and Customer networks and performs encap/decap of data from L2 to L3 and vice-versa. MB> Not sure you need to define PE as it is a well known term, but in any case I think Your definition differs from ones I could find I previous RFCs. Maybe you can just delete it. Designated Forwarder (DF): A PE that is currently forwarding (encapsulating/decapsulating) traffic for a given VLAN in and out of a site. 2. Challenges with Existing Solution In EVPN technology, multiple PE devices have the ability to encap and decap data belonging to the same VLAN. In certain situations, this may cause L2 duplicates and even loops if there is a momentary overlap of forwarding roles between two or more PE devices, leading to broadcast storms. EVPN [RFC7432] currently uses timer based synchronization among PE devices in redundancy group that can result in duplications (and even loops) because of multiple DFs if the timer is too short or blackholing if the timer is too long. Using split-horizon filtering (Section 8.3 of [RFC7432]) can prevent loops (but not duplicates), however if there are overlapping DFs in MB> I suggest you split the sentence to make it more readable: "...(but not duplicates). However, if there are..." two different sites at the same time for the same VLAN, the site identifier will be different upon re-entry of the packet and hence the split-horizon check will fail, leading to L2 loops. [...] However, upon PE insertion or port bring-up (recovery event), HRW MB> Do you mean "...or port bring-up following a recovery event,"? also cannot help as a transfer of DF role to the newly inserted device/port must occur while the old DF is still active. +---------+ +-------------+ | | | | | | / | PE1 |----| | +-------------+ / | | | MPLS/ | | |---CE3 / +-------------+ | VxLAN/ | | PE3 | CE1 - | Cloud | | | \ +-------------+ | |---| | \ | | | | +-------------+ \ | PE2 |----| | | | | | +-------------+ | | +---------+ Figure 1: CE1 multihomed to PE1 and PE2. In the Figure 1, when PE2 is inserted or booted up, PE1 will transfer MB> /transfer/transfer the DF role of some VLANs to PE2 to achieve load balancing. However, because there is no handshake mechanism between PE1 and PE2, duplication of DF roles for a given VLAN is possible. Duplication of DF roles may eventually lead to duplication of traffic as well as L2 loops. Current EVPN specification [RFC7432] and [RFC8584] relies on a timer- MB> /specification/specifications MB> /relies/rely based approach for transferring the DF role to the newly inserted device. This can cause the following issues: * Loops/Duplicates if the timer value is too short * Prolonged Traffic Blackholing if the timer value is too long 3. DF Election Synchronization Solution The solution relies on the concept of common clock alignment between partner PEs participating to a common Ethernet Segment. The main idea is to have all peering PEs of that Ethernet Segment perform DF election, and apply their resulting carving state, at a same well- known time. MB> It would be clearer if you could identify the partner YEs on a figure e.g. Figure 1 The DF Election procedure, as described in [RFC7432] and as optionally signalled in [RFC8584], is applied. All PEs attached to a given Ethernet Segment are clock-synchronized; using a networking MB> /clock-synchronized;/clock-synchronized protocol for clock synchronization (e.g. NTP, PTP, etc.). Newly inserted device PE or during failure recovery of a PE, that PE communicates the current time to peering partners plus the remaining peering timer time left. MB> The first part of the above does not parse. Do you mean "When a new PE is inserted or an existing PE device recovers,..."? This constitutes an "end time" or "absolute time" as seen from local PE. That absolute time is called "Service Carving Time" (SCT). A new BGP Extended Community is advertised along with Ethernet MB> Maybe say it is the "Service Carving Timestamp" here. Segment route (RT-4) to communicate to other partners the Service Carving Time. Upon reception of that new BGP Extended Community, partner PEs know MB> /know/can determine exactly its carving time. The notion of skew is introduced to eliminate any potential duplicate traffic or loops. They add a skew MB> Who is "they". Do you mean "The receiving partner PEs"? (default = -10ms) to the Service Carving Time to enforce this. The previously inserted PE(s) must carve first, followed shortly(skew) by the newly insterted PE. To summarize, all peering PEs carve almost simultaneously at the time announced by newly added/recovered PE. The newly inserted PE initiates the SCT, and carves immediately on peering timer expiry. The previously inserted PE(s) receiving Ethernet Segment route (RT-4) with a SCT BGP extended community, carve shortly before Service Carving Time. 3.1. Advantages MB> This section seems out of place in a protocol spec. I suggest moving this text to the end of the introduction. There are multiples advantages of using the approach. Here is a non- exhaustive list: * A simple uni-directional signaling is all that is needed * Backwards-compatible: PEs supporting only older [RFC7432] shall simply discard unrecognized new "Service Carving Timestamp" BGP Extended Community * Multiple DF Election algorithms can be supported: - [RFC7432] default ordered list ordinal algorithm (Modulo), - [RFC8584] highest-random weight, etc. * Independent of BGP transmission delay regarding Ethernet Segment route (RT-4) * Agnostic of the time synchronization mechanism used (e.g. NTP, PTP, etc.) […] 3.2. BGP Encoding […] This capability is used in conjunction with the agreed upon DF Type (DF Election Type). For example if all the PEs in the Ethernet Segment indicated that they have Time Synchronization capability and they want the DF type to be HRW, then HRW algorithm is used in S/then HRW/then the HRW conjunction with this capability.
_______________________________________________ BESS mailing list BESS@ietf.org https://www.ietf.org/mailman/listinfo/bess