Benjamin Kaduk has entered the following ballot position for draft-ietf-bess-mvpn-fast-failover-13: Discuss
When responding, please keep the subject line intact and reply to all email addresses included in the To and CC lines. (Feel free to cut this introductory paragraph, however.) Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html for more information about IESG DISCUSS and COMMENT positions. The document, along with other ballot positions, can be found here: https://datatracker.ietf.org/doc/draft-ietf-bess-mvpn-fast-failover/ ---------------------------------------------------------------------- DISCUSS: ---------------------------------------------------------------------- Let's talk about what the requirements are for consistency across PEs in the algorithm for selecting the Primary Upstream PE. Section 4 notes that "all the PEs of that MVPN [are required] to follow the same UMH selection procedure", but leaves the option of non-revertive behavior as something that "MAY also be supported by an implementation", without requirement for consistency across all PEs. It seems to me that if some PEs use non-revertive behavior and others do not, then they will disagree as to which PE is the Primary (or active) PE in some cases, which seems to conflict with the initial guidance that all PEs needed to pick the same one. Is it perhaps that the PEs need to agree on which PE is to be advertised as Primary but not necessarily to actually be using that one for traffic? Or am I missing something? ---------------------------------------------------------------------- COMMENT: ---------------------------------------------------------------------- Section 1 Section 3 describes local procedures allowing an egress PE (a PE connected to a receiver site) to take into account the status of P-tunnels to determine the Upstream Multicast Hop (UMH) for a given (C-S, C-G). [...] Does it also apply to (C-*, C-G)? (I'll just mention it once, but the handling seems to be somewhat inconsistent throughout the document, with (C-*,C-G) getting mentioned sometimes but not always, and no pattern obvious to me for when it is or is not included. I think I see some instances where (C-*, C-G) does not make sense, so it would probably not be a universal replacement.) Section 5 describes a "hot leaf standby" mechanism that can be used to improve failover time in MVPN. The approach combines mechanisms defined in Section 3 and Section 4 has similarities with the solution described in [RFC7431] to improve failover times when PIM routing is used in a network given some topology and metric constraints. nit: grammar issue around "has similarities with" (maybe needs a leading "and"?) VPNs. An operator would enable these mechanisms using a method discussed in Section 3 in combination with the redundancy provided by a standby PE connected to the source of the multicast flow, and it is assumed that all PEs in the network would support these mechanisms for the procedures to work. In the case that a BGP implementation Is it a matter of "the procedure will not work at all unless all PEs in the network support it", or "only the PEs that support it will get the benefits of it"? [The next sentence suggests an anwer...] Section 3 Section 9.1.1 of [RFC6513] are applicable when using I-PMSI P-tunnels. That document is a foundation for this document, and its processes all apply here. Section 9.1.1 mandates the use of specific procedures for sending intra-AS I-PMSI A-D Routes. (nit) the second "Section 9.1.1" is also referring to RFC 6513, not this document, which would be the default interpretation of a bare section reference. (not-nit) The referenced procedure seems to be about processing, not sending, intra-AS I-PMSI A-D routes. Am I misreading something? Section 3.1 Different factors can be considered to determine the "status" of a P-tunnel and are described in the following sub-sections. The optional procedures described in this section also handle the case the downstream PEs do not all apply the same rules to define what the status of a P-tunnel is (please see Section 6), and some of them will produce a result that may be different for different downstream PEs. nit: I think it's better to put a word like "where" in "the case the downtream PEs". Section 3.1.3 corresponding P-tunnel MUST be re-evaluated. If the P-tunnel transitions from Up to Down state, the Upstream PE that is the ingress of the P-tunnel MUST NOT be considered a valid UMH. (nit?) I'm not sure how much precedent there is for using "valid" in this context -- IIUC the previous discussion of this process referred only to whether a PE is a candidate for being the UMH. Section 3.1.5 When such a procedure is used, in the context where fast restoration mechanisms are used for the P-tunnels, a configurable timer MUST be set on the downstream PE to wait before updating the UMH, to let the P-tunnel restoration mechanism to execute its actions. An implementation SHOULD use three seconds as the default value for this timer. How does this interact with the value of the maximum inter-packet time? Suppose that I know to expect at least one packet every ten seconds. Do I wait ten seconds after receiving the last packet and then another three seconds, before engaging in an UMH change? In cases where this mechanism is used in conjunction with the method described in Section 5, no prior knowledge of the rate of the multicast streams is required; downstream PEs can compare reception on the two P-tunnels to determine when one of them is down. This feels a little underspecified; is there a reference or more guidance that we could give about turning a stream of received packets on one tunnel into a maximum inter-packet time on another tunnel, supposedly carrying the same traffic? Section 3.1.6 * one octet-long field of TLV's Type value (Section 7.3) * one octet-long field of the length of the Value field in octets * variable length Value field. The length of a TLV MUST be multiple of four octets. I assume this is the total length, not the value in the length field? The BFD Discriminator attribute MUST be considered malformed if its length is not a non-zero multiple of four. If the attribute considered malformed, the UPDATE message SHALL be handled using the approach of Attribute Discard per [RFC7606]. nit: s/attribute considered/attribute is considered/ Section 3.1.6.1 o MUST periodically transmit BFD Control packets over the x-PMSI P-tunnel after the P-tunnel is considered established. Note that the methods to declare a P-tunnel has been established are outside the scope of this specification. Is there a good reference for how to choose the period of transmission? If the tracking of the P-tunnel by using a P2MP BFD session is enabled after the x-PMSI A-D Route has been already advertised, the x-PMSI A-D Route MUST be re-sent with precisely the same attributes as before and the BFD Discriminator attribute included. Pedantically, it seems like "precisely the same attributes as before" is incompatible with adding the BFD Discriminator attribute. Phrasing that discusses "the only change between the previous advertisement and the new advertisement" would not suffer from such a potential issue. (And similarly for when the BFD Discriminator attribute is to be removed, a couple paragraphs later.) Section 3.1.6.2 o MUST use the source IP address of the BFD Control packet, the value of the BFD Discriminator field, and the x-PMSI Tunnel Identifier [RFC6514] the BFD Control packet was received to properly demultiplex BFD sessions. nit: missing word around "the BFD Control packet was received" (maybe "received on/in"?). According to [RFC8562], if the downstream PE receives Down or AdminDown in the State field of the BFD Control packet or associated with the BFD session Detection Timer expires, the BFD session is nit: "the BFD Detection Timer associated with the BFD session expires" PE, while others are considered as Standby Upstream PEs. In such a scenario, when the P-tunnel is considered down, the downstream PE MAY initiate a switchover of the traffic from the Primary Upstream PE to the Standby Upstream PE only if the Standby Upstream PE is deemed available. I'm not sure that we've defined what it means for an Upstream PE to be deemed "available', yet. I guess it's possible that there is not an established P-Tunnel between the (selected) Standby Upstream PE and the donstream PE, so just using the Up/Down/not-known-to-be-Down status of that P-tunnel is not an option... If the downstream PE's P-tunnel is already established when the downstream PE receives the new x-PMSI A-D Route with BFD Discriminator attribute, the downstream PE MUST associate the value of BFD Discriminator field with the P-tunnel and follow procedures listed above in this section if and only if the x-PMSI A-D Route was properly processed as per [RFC6514], and the BFD Discriminator attribute was validated. We did not discuss any validation of the BFD Discriminator attribute in §3.1.6; what procedures would this process entail? Section 4 The procedures described below are limited to the case where the site that contains C-S is connected to two or more PEs, though, to simplify the description, the case of dual-homing is described. The I suggest giving at least some considerations to how to choose between multiple standby Upstream PEs when there are more than one available. procedures require all the PEs of that MVPN to follow the same UMH selection procedure, as specified in [RFC6513], whether the PE selected based on its IP address, hashing algorithm described in section 5.1.3 of [RFC6513], or Installed UMH Route. The procedures I assume that how the PEs agree on which procedure is in use does not involve something being advertised in-band, and is out of scope for this document. But please say so! assume that if a site of a given MVPN that contains C-S is dual-homed to two PEs, then all the other sites of that MVPN would have two unicast VPN routes (VPN-IPv4 or VPN-IPv6) to C-S, each with its RD. nit: s/its RD/its own RD/ Also, please confirm that the unicast routes are *to* C-S, vs *from* it. Section 4.1 o the NLRI is constructed as the C-multicast route with an RT that identifies the Primary Upstream PE, except that the RD is the same as if the C-multicast route was built using the Standby Upstream PE as the UMH (it will carry the RD associated to the unicast VPN route advertised by the Standby Upstream PE for S and a Route Target derived from the Standby Upstream PE's UMH route's VRF RT Import EC); This part is a bit confusing to me, since the first part says that the RT identifies the Primary Upstream PE, but the second part says that the RT is derived from the Standy Upstream PE's [stuff]. But I'm happy to trust you that the [stuff] makes it correct! Section 4.2 when the PE determines (the use of the particular method to detect the failure is outside the scope of this document) that C-S is not reachable through some other PE, the PE SHOULD install VRF PIM It seems like a forward reference to §4.3 might be helpful. Section 9.3.2 of [RFC6514], describes the procedures of sending a Source-Active A-D Route as a result of receiving the C-multicast route. These procedures MUST be followed for both the normal and Standby C-multicast routes. There is no section 9.3.2 in RFC 6514. There is a 9.2.3 that looks perhaps plausible, though the string "Source-Active" does not appear in it. Section 4.4.2 Source AS carried in the C-multicast route. If the match is found, and the C-multicast route carries the Standby PE BGP Community, then the ASBR MUST perform as follows: (I assume that there is room for local policy to modify this "MUST", e.g., if needed to protect against some form of attack ... perhaps it even goes without saying.) Section 5 o Upstream PEs use the "hot standby" optional behavior and thus will forward traffic for a given multicast state as soon as they have whether a (primary) BGP C-multicast route or a Standby BGP C-multicast route for that state (or both) nit: the grammar is a bit weird here, after "as soon as they have"; I'm not confident that I could make an accurate suggestion for a fix. Section 6 I could almost see the discussion of duplicate packets as being a subsection of the security considerations, though I don't mind leaving it as-is. Section 8 We could perhaps make some pro forma note that the BFD Discriminator attribute, like all BGP attributes, typically does not benefit from cryptographic integrity protection and thus could be spoofed so as to be different than what is actually used by the multipoint BFD head. That said, I'm willing to let this fall under the incorporated-by-reference BGP security considerations. Is it worth noting that operating in "hot" standby mode will increase the general level of traffic on the VPN and thus susceptibility to DoS? This document uses P2MP BFD, as defined in [RFC8562], which, in turn, is based on [RFC5880]. Security considerations relevant to each protocol are discussed in the respective protocol specifications. An implementation that supports this specification MUST use a mechanism to control the maximum number of P2MP BFD sessions that can be active at the same time. What is the objective that this control is designed to achieve? I can "control the maximum number of sessions" by asserting the maximum number to be an absurdly large value, but I don't think that would meet the spirit of this requirement (it does meet the letter of the requirement). The methods described in Section 3.1 may produce false-negative state changes that can be the trigger for an unnecessary convergence in the control plane, ultimately negatively impacting the multicast service provided by the VPN. An operator is expected to consider the network environment and use available controls of the mechanism used to determine the status of a P-tunnel. We mentioned earlier (e.g., in §3.1) that similar negative effects can occur when resiliency mechanisms at different layers interact; that might be worth repeating here. _______________________________________________ BESS mailing list BESS@ietf.org https://www.ietf.org/mailman/listinfo/bess