Re: Need your help to make sure the draft-ietf-rtgwg-net2cloud-problem-statement readability is good.

Joel Halpern Tue, 22 Aug 2023 20:46:08 -0700

Thank you.

Joel


On 8/22/2023 11:40 PM, Linda Dunbar wrote:


Joel,

Thank you very much for your suggestion. We will take your suggestedwording into the document:

/“When a site failure occurs, many instances can be impacted. When theimpacted instances’ IP prefixes in a Cloud DC are not aggregatednicely, which is very common, one single site failure can trigger ahuge number of BGP UPDATE messages. There are proposals, such as[METADATA-PATH], to enhance BGP advertisements to address this problem.”/


//

Linda

*From:* Joel Halpern <[email protected]>
*Sent:* Tuesday, August 22, 2023 6:03 PM
*To:* Linda Dunbar <[email protected]>

*Cc:* rtgwg-chairs <[email protected]>;[email protected]; [email protected]*Subject:* Re: Need your help to make sure thedraft-ietf-rtgwg-net2cloud-problem-statement readability is good.

I think I now understand your point. As a problem statement draft, Iwould replace the detailed description of the specific proposal with amore generic "There are proposals to enhance BGP advertisements toaddress this problem."


Yours,

Joel

On 8/22/2023 6:34 PM, Linda Dunbar wrote:

    Joel,

    I see your points. Please see my explanation below quoted by <ld>
    </ld>.

    *From:*Joel Halpern <[email protected]>
    <mailto:[email protected]>
    *Sent:* Monday, August 21, 2023 11:34 PM
    *To:* Linda Dunbar <[email protected]>
    <mailto:[email protected]>
    *Cc:* rtgwg-chairs <[email protected]>
    <mailto:[email protected]>;
    [email protected];
    [email protected]
    *Subject:* Re: Need your help to make sure the
    draft-ietf-rtgwg-net2cloud-problem-statement readability is good.

    Thank you Linda.  Trimmed the agreements, including acceptable
    text from your reply. Leaving the two points that can benefit from
    a little more tuning.

    Marked <jmh2></jmh2>

    Yours,

    Joel

    On 8/22/2023 12:12 AM, Linda Dunbar wrote:

    Similarly, section 3.2 looks like it could apply to any operator.
    The reference to the presence or absence of IGPs seems largely
    irrelevant to the question of how partial failures of a facility
    are detected and dealt with.

    [Linda] Two reasons that the site failure described in Section 3.2
    do not apply to other networks:

     1. One DC can have many server racks concentrated in a small area
        which can fail by one single event. Vs. Regular network
        failure at one location only impact the routers at the
        location, which quickly triggers the services switched to the
        protection paths.
     2. Regular networks run IGP, which can propagate inner fiber cut
        failures quickly to the edge. While as many DCs don’t run IGP.

    <jmh>Given that even a data center has to deal with internal
    failures, and that even traditional ISPs have to deal with
    partitioning failures, I don't think the distinction you are
    drawing in this section really exists.  If it does, you need to
    provide stronger justification.  Also, not all public DCs have
    chosen to use just BGP, although I grant that many have. I don't
    think you want to argue that the folks who have chosen to use BGP
    are wrong.  </jmh>

    <ld> Are you referring to Network-Partitioning Failures in Cloud
    Systems?

    Traditional ISPs don’t host end services; they are responsible for
    transporting packets;  therefore protection path can reroute
    packets . But Cloud DC site/PoD failure causing all the hosts
    (prefixes) no longer reachable </ld>

    <jmh2> If a DC Site fails, the services failed too.  Yes, the DC
    operator has to reinstantiate them.  But that is way outside our
    scope.  To the degree that they can recover by rerouting to other
    instances (whether using anycast or some other trick) it looks
    just like routing around failures in other case, which BGP and
    IGPs can do.  I am still not seeing how this justifies any special
    mechanisms. </jmh2>
    <ld>

    You are correct that the protection is the same as the regular ISP
    networks.

    The paragraph is intended to say the following:

    When a site failure occurs, many instances can be impacted. When
    the impacted instances’ IP prefixes in a Cloud DC are not
    aggregated nicely, which is very common, one single site failure
    can trigger a huge number of BGP UPDATE messages. Instead of many
    BGP UPDATE messages to the ingress routers for all the instances
    impacted, [METADATA-PATH] proposes one single BGP UPDATE
    indicating the site failure. The ingress routers can switch all
    the instances that are associated with the site.

    </ld>


_______________________________________________
rtgwg mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/rtgwg

_______________________________________________
rtgwg mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/rtgwg

Re: Need your help to make sure the draft-ietf-rtgwg-net2cloud-problem-statement readability is good.

Reply via email to