Re: [bess] AD Review of draft-ietf-bess-multicast-damping-03 / -04

thomas.morin Mon, 21 Mar 2016 10:20:56 -0700

Hi Alvaro,

We've posted -04 last week, based on the discussion we had.
Please tell us if you think this is ready to move forward.


Best,

-Thomas

2016-03-09, Alvaro Retana (aretana):

On 2/24/16, 11:58 AM, "Thomas Morin" <thomas.mo...@orange.com<mailto:thomas.mo...@orange.com>> wrote:


Thomas:

Hi!

There are several places where we're still not in sync.  Please se below.

Maybe talking in person would be good.

Thanks!

Alvaro.




    2016-02-23, Alvaro Retana (aretana):

    The Abstract says that the procedures are "inspired from
    BGP unicast route damping".  It seems to me that the intent is in
    fact to adopt the algorithm from RFC2439.  However, the text is
    not explicit/clear about that.

    Saying so would I think actually be a misleading simplification,
    the reader may miss the facts:
    - that the proposal is to keep advertising a dampened multicast
    VPN state up (in RFC2439, a dampened route stops being advertised)
    - that the document is not BGP-specific but also specifies a
    mechanism for the PIM FSM
    - that only exponential decay is borrowed from RFC2439

This last sentence is what I meant above by "adopt the algorithm fromRFC2439". In other words, your proposal is to dampen the multicaststate using the exponential decay algorithm defined in RFC2439, right?

In your response to my detailed comments there are statements thatconfuse me a little… I put some more comments/questions below.




     1. As you all know, the history behind BGP damping has not been
        without it being considered useless and even having
        recommendations (from RIPE, for example) not to use it.


    A few things are important to have in mind:
    - the application context here is not the Internet
    - multicast state propagates in a very different fashion
    - the damping algorithm techniques is not the same
    - the side-effects of damping unicast and damping multicast as
    proposed here are fundamentally different: here damping causes no
    impact on the service

    Overall, I would say that the weaknesses of RFC2439 and the
    recommendations in RFC7196 were well known by co-authors and that
    we came to the conclusion that this multicast VPN would not suffer
    from similar weaknesses.

    How did you arrive at the default and maximum values?


    By simulating with simple parameters and choosing conservative
    low-risk values.
    Considering that, by design, whatever the parameters, multicast
    streams will be delivered unchanged and that the only thing you
    tradeoff against is less dynamicity and a possibly slightly
    increased bandwidth use, the default and maximum values do not
    have to be perfectly tuned.

Please include this type of information (above) in the document.Given the history of dampening in general, understanding thedifferences considered will go a long way and avoid more questions. ;-)

    It concerns me that there are no known implementations (from the
    Shepherd's report).

    This concern is valid.
    See below...

    Because of that, I think this document would be better suited as
    an Experimental RFC, with the explicit purpose of gaining
    experience with the values and determine the impact in live
    deployments (which then could support a standard version).
     Please consider changing the intended Status.

    Let me go back to why this proposal started: some lab testing was
    done showing that it was easy, in the lab, to create significant
    overload on PE and RRs BGP stacks by flapping multicast state at
    the edge.  Having a standard track to provide the appropriate
    tooling against this DoS risk seems to me as making sense.  I
    think that the proposed procedures are not close enough to the
    solution and problem addressed by RFC2439 to say that RFC2439's
    history is an argument to pass through an Experimental RFC first.

You might be right in this last point (the history of RFC2439shouldn't affect this document), specially given the explanation yougave earlier (where you talked about multicast being different, notintended for the Internet, etc.).

However, the rest of your answer (in that last paragraph) points onlyat experience in seeing the problem and testing the solution "in thelab". Not outside the lab. In other words, the motivation andproblem both came from lab tests. Augmented by the fact that thereare no implementations it renews my proposal to make this documentExperimental — as I said before, with the explicit purpose of gainingexperience: is the problem observed in deployed networks, are theconditions there the same/similar as in the lab, are the parametersadequate.

Note that I don't think that an RFC has to be in the Standards Trackto prove useful.

…

     1. Are you adopting the exponential decay algorithm from
        RFC2439?  That seems to be what's happening because you are
        not explicitly defining a new algorithm, but some of the text
        leave doubts.  For example:
          * "inspired from BGP unicast route damping"  I know the
            application is different, but if the algorithm is the
            same then please say it.

    The procedures associated to the exponential decay are different.

Are you referring to the procedures as to when a penalty is incurred(multicast state change vs routing update), action (maintain the statevs stop advertising a route), etc. ??

    1.
          * Section 5.1. (PIM procedures)
              o "updating the *figure-of-merit* based on the decay
                algorithm must be done prior to this increment"  This
                statement seems to directly imply that the algorithm
                is used.  Please reorder the steps to explicitly call
                this one out, instead of plugging it in as an
                afterthought.  BTW, should the "must" be "MUST"?
                 Ordering should help you not having to deal with
                that last question.


    I've revised the text to describe this step in its own bullet,
    prior to "updating the figure-of-merit", to avoid this "after
    thought" impression and make it as mandatory as the other steps.

    1.
         *
              o "Same techniques as the ones described in [RFC2439]
                can be applied…" "Can be"?  This sentence seems to
                imply that what is described in RFC2439 is optional.
                 Are there other ways of determining the same thing?
                 What about the exponential decay algorithm?


    No, there are just multiple ways possible to update the
    figure-of-merit, including the ones RFC2439 mention or detail.

    I've reformulated the text to avoid misinterpretation:

      These specifications do not impose the use of a particular technique
       to update the *figure-of-merit* following the exponential decay
       algorithm based on the configured *decay-half-life*. In particular
       the same techniques as the ones described in [RFC2439] can be
       applied.  The only requirement is that the *figure-of-merit* has to
       be updated prior to increasing it and that its decay below the
       *reuse-threshold* has to be timely reacted upon: in particular, if
       the recomputation is done periodically, the period should be low
       enough to not significantly delay the inactivation of damping on a
       multicast state beyond what the operator wanted to configure (i.e.
       for a *decay-half-life* of 10s, recomputing the *figure-of-merit*
       each minute would result in a multicast state to remained
    damped for
       a much longer time than what the parameters are supposed to
    command).

If I'm understanding (that the algorithm in RFC2439 is one possibleway to update the figure-of-merit, but that there are otherspossible), then:


 1. this text only augments my point about this document being
    experimental; the text is not specific as to how things should
    work, it leaves the door open to almost anything…which brings me
    back to the question about how do you know that the proposed
    defaults will work with anything…Experimental, etc.
 2. Even for the known method (exponential back off in RFC2439) the
    text is not prescriptive enough to be a Standard.

    1.
         *
              o It would also help if the terminology was consistent.
                 For example, instead of "damping becomes active" use
                "suppressed".  I can see how "suppressed" may give
                the wrong impression as only the propagation of state
                is affected.  Explaining then how the terminology
                applies would make it easier to reuse, avoid
                confusion and be clear.  Note that there's no mention
                of RFC2439 in the terminology section.


    Using the "suppressed" term to describe a state that we
    artifically keep active is the most confusing thing that I can
    think of. As you say this would give a wrong impression.  I would
    go as far as to say that the document would be barely understandable.

    But maybe we can add this to the terminology section:

        In these specifications, damping of a multicast state will be
        said to be "active" or "inactive". Note that the term used for
        a unicast route which is dampened is "suppressed", but we
        avoid this term is these specifications given that a dampened
        multicast state is kept active.

    Would that help ?


Yes.

That would go in the Terminology section, right? As there are otherRFC2439 terms, it would be good to make a blanket statement thereabout that too.

    1.
         *

     2. Section 3. (Overview): "…it is expected that this technique
        will allow to meet the goals of protecting the
        multicast routing infrastructure control plane without a
        significant average increase of bandwidth".  In general, I
        want to make sure that the qualities of the solution and the
        expected results are properly reflected in the document. [I'm
        using the text above as the base for my comment, but the
        impact is larger.]  Some questions:
          * "…it is expected that this technique will…"  I wonder why
            an assertion can't be made that this technique can (vs
            just expecting that it will) address specific problems.
             Is it the case that experience is needed to make a
            stronger assertion?  Are the goals the same (or at least
            similar) in every network?  Are there implementations
            available?  If so, please consider an "Implementation
            Status" section (see rfc6982).  What has been the
            deployment experience?  This goes back to my comment
            above about the Intended Status of this document.


    "It is expected" reflects the idea that the slight increase in
    bandwidth will not be significant in most cases.
    We can expand the text a bit to explain what would be the cases
    where that would not work.

    Let me suggest the following reformulation:

    "That said, basic simulation of the exponential decay algorithm
    show that the multicast state churn can be drastically reduced
    without significantly increasing the duration for which multicast
    traffic is forwarded. Hence, using this technique will efficiently
    protect the multicast routing infrastructure control plane against
    the issues described here, without a significant average increase
    of bandwidth.  The exception will be a scenario where the network
    dimensioning does not allow to extend the time a multicast flow is
    forwarded beyond the duration for which is it needed by receivers".

I don't know what the last sentence means. :-( What is "networkdimensioning"? It sounds that not extending "the time a multicastflow is forwarded beyond the duration for which is it needed byreceivers" is not a bad thing… Other than that last sentence, thetext sounds clearer.

You again talk about simulation experience, which as close at it mayhave been to real conditions it is just a simulation. It's ok tomention this because that is the experience you have. You alsomention the exponential decay algorithm, but the text above aboutother possible methods takes me back to: what happens if a differentmethod is used?

    1.
          * What specifically are the goals?  In a couple of places
            the text points back at Section 1. (Introduction),
            but I'm not sure exactly what the goals are.  Of special
            interest for understanding the goals is the part in
            Section 4.2. (Existing PIM, IGMP and MLD timers) where
            other solutions are discarded for not meeting them.
              o There is scattered text that talks about "…ensure
                that the load put on the BGP control plane, and on
                the P-tunnel setup control plane, remains under
                control…", "protecting these control planes…avoiding
                negative effects…although at the expense of a minimal
                increase in average of bandwidth use…".   However,
                the description is too vague to point at what
                can satisfy these goals and what can't.



    Section one 1 says:
    - " Hence, mechanisms need to be put in place to ensure that the
    load put on the BGP control plane, and on the P-tunnel setup
    control plane, remains under control regardless of the frequency
    at which multicast memberships changes are made by end hosts."
    -then  "This document describes procedures, remotely inspired from
    existing BGP route damping, aimed at protecting these control
    planes while at the same time avoiding negative effects on the
    service provided, although at the expense of a minimal increase in
    average of bandwidth use in the network."

    The intent was that the text would be enough to make the goals clear.

    Would the following change of the second sentence provide suitable
    detail to help understand what can satisfy these goals and what
    can't :   ...?

    [...] aimed at offering means to set an upper bound to the
    affected control planes (BGP RFC6514 processing, and the P-tunnel
    control plane protocol in certain cases as well) while at the same
    time preserving service provided (delivering the stream to the end
    user as requested), although at the expense of a minimal increase
    in average of bandwidth use in the network.

    I see that we can reorder the text to avoid splitting the
    explanation of goals.

    The new text would look like the following:

       In VPN contexts, providing isolation between customers of a shared
       infrastructure is a core requirement resulting in stringent
       expectations with regards to risks of denial of service attacks.

       By nature multicast memberships change based on the behavior of
       multicast applications running on end hosts, hence the frequency of
       membership changes can legitimately be much higher than the typical
       churn of unicast routing states.  Section 16 of [RFC6514]
       specifically spells out the need for damping the activity of
       C-multicast and Leaf Auto-discovery routes.

       Hence, mechanisms need to be put in place to ensure that the
    load put
       on the BGP control plane, and on the P-tunnel setup control plane,
       remains under control regardless of the frequency at which
    multicast
       memberships changes are made by end hosts.

       This document describes procedures, remotely inspired from existing
       BGP route damping, aimed at offering means to set an upper bound to
       the amount of processing for the mVPN control planes protocols
       ([RFC6514], and the P-tunnel control plane protocol in certain
    cases
       as well), while at the same time preserving service provided
       (delivering the stream to the end user as requested), although
    at the
       expense of a minimal increase in average of bandwidth use in the
       network.

That text is better, but it still includes statements like "…ensurethat the load…remains under control…", and later "set an upper bound".I'm not too happy with vague goals as keeping something under control(for example) can mean many things --- and the upper bound is notclearly defined. This upper bound is probably a function of thedefaults chosen; explaining that (not in this section) would be nice.

…


     1. Section 5.2. (Procedures for multicast VPN state damping)
          * In the Introduction you write that "Section 16 of
            [RFC6514] specifically spells out the need for damping
            the activity…"  I think that RFC6514 does a lot more than
            that:  Section 16.1. (Dampening C-Multicast Routes)
            "proposes OPTIONAL route dampening procedures similar to
            what is described in [RFC2439]."   Those procedures look
            very similar to the ones in this document.  What is
            the difference?  Is the intent of this document to
            complement, replace or maybe update what is already
            specified in RFC6514?


    Indeed, the base ideas for dampening were already here when we
    wrote RFC6514.
    draft-ietf-bess-multicast-damping provides precision on how to
    implement RFC6514 16.1.1, but this is not an update per se as
    nothing in RFC6514 is changed.

    We can make that fully explicit by saying in Section 1:

       Section 16 of [RFC6514] specifically spells out the need for
    damping
       the activity of C-multicast and Leaf Auto-discovery routes, and
       outlines how to do it by "delay the advertisement of withdrawals of
       C-multicast routes".  These specifications provides appropriate
       detail on how to implement that and how to make that controllable
       by the operator.

That is an update: by clarifying and providing specifics you are infact updating RFC6514. We want to mark it that way (and be explicitabout it) because we want someone reading RFC6514 to refer to thisdocument if wanting to implement dampening.

…

1.


        Minor:


     1. In 4.2
          * s/prune override interval/J/P_Override_Interval


    I'd rather keep the plain text version.


"J/P_Override_Interval" is that this interval is called in rfc460bis.

…

    1.



     2. Section 5.2. (Procedures for multicast VPN state damping)
          * There are several places in this section where rfc2119
            language is used to describe what an implementation
            should do that sound to me as an attempt to define
            functionality that is mandatory to implement (MTI).  I
            find that hard/impossible to enforce and would like to
            see the rfc2119 language removed.  Please see below..


    Yes, the MUSTs in this 5.1 and 5.2 intent to carry the meaning of
    "mandatory to implement".


[Skipping to the specific RFC2119 question.]
…

    I understand that you seem to prefer avoiding RFC2119 language for
    MTI things.
    But I don't know another way than RFC2119 language to indicate
    what is mandatory to implement to be compliant with a spec, and I
    think this is a fairly well established practice. This is not the
    first document to use RFC2119 to indicate MTI things.

    What is the rationale for not using RFC2119 language ?


RFC2119 reads:

6. Guidance in the use of these Imperatives

   Imperatives of the type defined in this memo must be used with care
   and sparingly.  In particular, they MUST only be used where it is
   actually required for interoperation or to limit behavior which has
   potential for causing harm (e.g., limiting retransmisssions)  For
   example, they must not be used to try to impose a particular method
   on implementors where the method is not required for
   interoperability.


Two points from there:

 1. If it's not necessary to for interoperation, then don't use them.
    Using this text as an example: "Implementation of [RFC6513]
    relying on the use of PIM to carry C-multicast routing information
    MUST support this technique."  Implementing dampening is not
    necessary for RFC6513 implementations to interoperate.  In fact,
    if one implementation enables dampening and the other doesn't,
    they will still interoperate.
 2. Note that the last sentence refers specifically to implementation
    choices.  Using this text as an example: "The choice to implement
    damping…is up to the implementor…implementing the BGP approach is
    RECOMMENDED."  There is no need to use "RECOMMENDED" because it is
    not necessary for interoperability and by using it you're trying
    to impose a specific method.


…

    1.
         *

          * "…damping SHOULD NOT be applied to BGP routes of
            the following sub-types…"  Are there cases when it is ok?
             In other words, why is the "SHOULD NOT" not a "MUST NOT"?


    Maybe someone can find a case where this does not break things,
    under some conditions.
    We saw nothing mandating the use of "MUST NOT".


What conditions?

Someone finding a case sounds like Experimentation to me…

    1.




     2. Section 6.1. (Damping mVPN P-tunnel change events) "Possible
        ways to do so depend on the type of P-tunnel, and local

implementation details are left up to the implementor.The following is proposed as example of how the above can

        be achieved."  Either you leave it as an implementation
        detail or you provide guidance.  If this document was
        Experimental, then providing guidance it great!


    There is a gap between "example" and "guidance".
    I think an example can help the reader (implementor or deployer).
    Guidance would mean that we start influencing the implementor,
    which is not the idea here.


You already are influencing!  See above about MTI.



_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.

_______________________________________________
BESS mailing list
BESS@ietf.org
https://www.ietf.org/mailman/listinfo/bess

Re: [bess] AD Review of draft-ietf-bess-multicast-damping-03 / -04

Reply via email to