[bess] Benjamin Kaduk's Discuss on draft-ietf-bess-mvpn-fast-failover-13: (with DISCUSS and COMMENT)

Benjamin Kaduk via Datatracker Mon, 14 Dec 2020 16:51:16 -0800

Benjamin Kaduk has entered the following ballot position for
draft-ietf-bess-mvpn-fast-failover-13: Discuss


When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-bess-mvpn-fast-failover/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

Let's talk about what the requirements are for consistency across PEs in
the algorithm for selecting the Primary Upstream PE.  Section 4 notes
that "all the PEs of that MVPN [are required] to follow the same UMH
selection procedure", but leaves the option of non-revertive behavior as
something that "MAY also be supported by an implementation", without
requirement for consistency across all PEs.  It seems to me that if some
PEs use non-revertive behavior and others do not, then they will
disagree as to which PE is the Primary (or active) PE in some cases,
which seems to conflict with the initial guidance that all PEs needed to
pick the same one.  Is it perhaps that the PEs need to agree on which PE
is to be advertised as Primary but not necessarily to actually be using
that one for traffic?  Or am I missing something?


----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

Section 1

   Section 3 describes local procedures allowing an egress PE (a PE
   connected to a receiver site) to take into account the status of
   P-tunnels to determine the Upstream Multicast Hop (UMH) for a given
   (C-S, C-G).  [...]

Does it also apply to (C-*, C-G)?  (I'll just mention it once, but the
handling seems to be somewhat inconsistent throughout the document, with
(C-*,C-G) getting mentioned sometimes but not always, and no pattern
obvious to me for when it is or is not included.  I think I see some
instances where (C-*, C-G) does not make sense, so it would probably not
be a universal replacement.)

   Section 5 describes a "hot leaf standby" mechanism that can be used
   to improve failover time in MVPN.  The approach combines mechanisms
   defined in Section 3 and Section 4 has similarities with the solution
   described in [RFC7431] to improve failover times when PIM routing is
   used in a network given some topology and metric constraints.

nit: grammar issue around "has similarities with" (maybe needs a leading
"and"?)

   VPNs.  An operator would enable these mechanisms using a method
   discussed in Section 3 in combination with the redundancy provided by
   a standby PE connected to the source of the multicast flow, and it is
   assumed that all PEs in the network would support these mechanisms
   for the procedures to work.  In the case that a BGP implementation

Is it a matter of "the procedure will not work at all unless all PEs in
the network support it", or "only the PEs that support it will get the
benefits of it"?  [The next sentence suggests an anwer...]

Section 3

   Section 9.1.1 of [RFC6513] are applicable when using I-PMSI
   P-tunnels.  That document is a foundation for this document, and its
   processes all apply here.  Section 9.1.1 mandates the use of specific
   procedures for sending intra-AS I-PMSI A-D Routes.

(nit) the second "Section 9.1.1" is also referring to RFC 6513, not this
document, which would be the default interpretation of a bare section
reference.

(not-nit) The referenced procedure seems to be about processing, not
sending, intra-AS I-PMSI A-D routes.  Am I misreading something?

Section 3.1

   Different factors can be considered to determine the "status" of a
   P-tunnel and are described in the following sub-sections.  The
   optional procedures described in this section also handle the case
   the downstream PEs do not all apply the same rules to define what the
   status of a P-tunnel is (please see Section 6), and some of them will
   produce a result that may be different for different downstream PEs.

nit: I think it's better to put a word like "where" in "the case the
downtream PEs".

Section 3.1.3

   corresponding P-tunnel MUST be re-evaluated.  If the P-tunnel
   transitions from Up to Down state, the Upstream PE that is the
   ingress of the P-tunnel MUST NOT be considered a valid UMH.

(nit?) I'm not sure how much precedent there is for using "valid" in
this context -- IIUC the previous discussion of this process referred
only to whether a PE is a candidate for being the UMH.

Section 3.1.5

   When such a procedure is used, in the context where fast restoration
   mechanisms are used for the P-tunnels, a configurable timer MUST be
   set on the downstream PE to wait before updating the UMH, to let the
   P-tunnel restoration mechanism to execute its actions.  An
   implementation SHOULD use three seconds as the default value for this
   timer.

How does this interact with the value of the maximum inter-packet time?
Suppose that I know to expect at least one packet every ten seconds.  Do
I wait ten seconds after receiving the last packet and then another
three seconds, before engaging in an UMH change?

   In cases where this mechanism is used in conjunction with the method
   described in Section 5, no prior knowledge of the rate of the
   multicast streams is required; downstream PEs can compare reception
   on the two P-tunnels to determine when one of them is down.

This feels a little underspecified; is there a reference or more
guidance that we could give about turning a stream of received packets
on one tunnel into a maximum inter-packet time on another tunnel,
supposedly carrying the same traffic?

Section 3.1.6

      *  one octet-long field of TLV's Type value (Section 7.3)

      *  one octet-long field of the length of the Value field in octets

      *  variable length Value field.

      The length of a TLV MUST be multiple of four octets.

I assume this is the total length, not the value in the length field?

   The BFD Discriminator attribute MUST be considered malformed if its
   length is not a non-zero multiple of four.  If the attribute
   considered malformed, the UPDATE message SHALL be handled using the
   approach of Attribute Discard per [RFC7606].

nit: s/attribute considered/attribute is considered/

Section 3.1.6.1

   o  MUST periodically transmit BFD Control packets over the x-PMSI
      P-tunnel after the P-tunnel is considered established.  Note that
      the methods to declare a P-tunnel has been established are outside
      the scope of this specification.

Is there a good reference for how to choose the period of transmission?

   If the tracking of the P-tunnel by using a P2MP BFD session is
   enabled after the x-PMSI A-D Route has been already advertised, the
   x-PMSI A-D Route MUST be re-sent with precisely the same attributes
   as before and the BFD Discriminator attribute included.

Pedantically, it seems like "precisely the same attributes as before"
is incompatible with adding the BFD Discriminator attribute.  Phrasing
that discusses "the only change between the previous advertisement and
the new advertisement" would not suffer from such a potential issue.
(And similarly for when the BFD Discriminator attribute is to be
removed, a couple paragraphs later.)

Section 3.1.6.2

   o  MUST use the source IP address of the BFD Control packet, the
      value of the BFD Discriminator field, and the x-PMSI Tunnel
      Identifier [RFC6514] the BFD Control packet was received to
      properly demultiplex BFD sessions.

nit: missing word around "the BFD Control packet was received" (maybe
"received on/in"?).

   According to [RFC8562], if the downstream PE receives Down or
   AdminDown in the State field of the BFD Control packet or associated
   with the BFD session Detection Timer expires, the BFD session is

nit: "the BFD Detection Timer associated with the BFD session expires"

   PE, while others are considered as Standby Upstream PEs.  In such a
   scenario, when the P-tunnel is considered down, the downstream PE MAY
   initiate a switchover of the traffic from the Primary Upstream PE to
   the Standby Upstream PE only if the Standby Upstream PE is deemed
   available.

I'm not sure that we've defined what it means for an Upstream PE to be
deemed "available', yet.  I guess it's possible that there is not an
established P-Tunnel between the (selected) Standby Upstream PE and the
donstream PE, so just using the Up/Down/not-known-to-be-Down status of
that P-tunnel is not an option...

   If the downstream PE's P-tunnel is already established when the
   downstream PE receives the new x-PMSI A-D Route with BFD
   Discriminator attribute, the downstream PE MUST associate the value
   of BFD Discriminator field with the P-tunnel and follow procedures
   listed above in this section if and only if the x-PMSI A-D Route was
   properly processed as per [RFC6514], and the BFD Discriminator
   attribute was validated.

We did not discuss any validation of the BFD Discriminator attribute in
§3.1.6; what procedures would this process entail?

Section 4

   The procedures described below are limited to the case where the site
   that contains C-S is connected to two or more PEs, though, to
   simplify the description, the case of dual-homing is described.  The

I suggest giving at least some considerations to how to choose between
multiple standby Upstream PEs when there are more than one available.

   procedures require all the PEs of that MVPN to follow the same UMH
   selection procedure, as specified in [RFC6513], whether the PE
   selected based on its IP address, hashing algorithm described in
   section 5.1.3 of [RFC6513], or Installed UMH Route.  The procedures

I assume that how the PEs agree on which procedure is in use does not
involve something being advertised in-band, and is out of scope for this
document.  But please say so!

   assume that if a site of a given MVPN that contains C-S is dual-homed
   to two PEs, then all the other sites of that MVPN would have two
   unicast VPN routes (VPN-IPv4 or VPN-IPv6) to C-S, each with its RD.

nit: s/its RD/its own RD/
Also, please confirm that the unicast routes are *to* C-S, vs *from* it.

Section 4.1

   o  the NLRI is constructed as the C-multicast route with an RT that
      identifies the Primary Upstream PE, except that the RD is the same
      as if the C-multicast route was built using the Standby Upstream
      PE as the UMH (it will carry the RD associated to the unicast VPN
      route advertised by the Standby Upstream PE for S and a Route
      Target derived from the Standby Upstream PE's UMH route's VRF RT
      Import EC);

This part is a bit confusing to me, since the first part says that the
RT identifies the Primary Upstream PE, but the second part says that the
RT is derived from the Standy Upstream PE's [stuff].  But I'm happy to
trust you that the [stuff] makes it correct!

Section 4.2

      when the PE determines (the use of the particular method to detect
      the failure is outside the scope of this document) that C-S is not
      reachable through some other PE, the PE SHOULD install VRF PIM

It seems like a forward reference to §4.3 might be helpful.

   Section 9.3.2 of [RFC6514], describes the procedures of sending a
   Source-Active A-D Route as a result of receiving the C-multicast
   route.  These procedures MUST be followed for both the normal and
   Standby C-multicast routes.

There is no section 9.3.2 in RFC 6514.  There is a 9.2.3 that looks
perhaps plausible, though the string "Source-Active" does not appear in
it.

Section 4.4.2

   Source AS carried in the C-multicast route.  If the match is found,
   and the C-multicast route carries the Standby PE BGP Community, then
   the ASBR MUST perform as follows:

(I assume that there is room for local policy to modify this "MUST",
e.g., if needed to protect against some form of attack ... perhaps it
even goes without saying.)

Section 5

   o  Upstream PEs use the "hot standby" optional behavior and thus will
      forward traffic for a given multicast state as soon as they have
      whether a (primary) BGP C-multicast route or a Standby BGP
      C-multicast route for that state (or both)

nit: the grammar is a bit weird here, after "as soon as they have"; I'm
not confident that I could make an accurate suggestion for a fix.

Section 6

I could almost see the discussion of duplicate packets as being a
subsection of the security considerations, though I don't mind leaving
it as-is.

Section 8

We could perhaps make some pro forma note that the BFD Discriminator
attribute, like all BGP attributes, typically does not benefit from
cryptographic integrity protection and thus could be spoofed so as to be
different than what is actually used by the multipoint BFD head.  That
said, I'm willing to let this fall under the incorporated-by-reference
BGP security considerations.

Is it worth noting that operating in "hot" standby mode will increase
the general level of traffic on the VPN and thus susceptibility to DoS?

   This document uses P2MP BFD, as defined in [RFC8562], which, in turn,
   is based on [RFC5880].  Security considerations relevant to each
   protocol are discussed in the respective protocol specifications.  An
   implementation that supports this specification MUST use a mechanism
   to control the maximum number of P2MP BFD sessions that can be active
   at the same time.

What is the objective that this control is designed to achieve?  I can
"control the maximum number of sessions" by asserting the maximum number
to be an absurdly large value, but I don't think that would meet the
spirit of this requirement (it does meet the letter of the requirement).

   The methods described in Section 3.1 may produce false-negative state
   changes that can be the trigger for an unnecessary convergence in the
   control plane, ultimately negatively impacting the multicast service
   provided by the VPN.  An operator is expected to consider the network
   environment and use available controls of the mechanism used to
   determine the status of a P-tunnel.

We mentioned earlier (e.g., in §3.1) that similar negative effects can
occur when resiliency mechanisms at different layers interact; that
might be worth repeating here.



_______________________________________________
BESS mailing list
BESS@ietf.org
https://www.ietf.org/mailman/listinfo/bess

[bess] Benjamin Kaduk's Discuss on draft-ietf-bess-mvpn-fast-failover-13: (with DISCUSS and COMMENT)

Reply via email to