Re: [GROW] Review request for draft-szarecki-grow-abstract-nh-scaleout-peering-00

Robert Raszuk Fri, 01 Mar 2019 07:51:02 -0800

Hey Rafał,

Just do not set BGP NH to ANH in export policy.
>


In your Junos cfg example (slide 28) ANH is applied to EBGP peers so I am
not sure how I can send something different to any IBGP peer:

[edit protocols bgp group PeerAS2]
type external;
egress-te {
install-address 1.1.1.2;
rib {
inet.0;
}
}
peer-as 2;
neighbor 11.1.1.1;
neighbor 11.1.1.5;


[RJS] I agree. Memory footprint at RR is not an issue. Convergence at scale
> is.
>

Convergence is not a problem .. in fact no one should be counting on
protocol *convergence* for fast connectivity restoration at any event of
failure these days.


Let assume at site 1 I have 4 ASBRS connected to AS_2 each with 1 sessions,
> and this ASBR learns 300k prefixes form AS_2 and all of then are best from
> each ASBR POV. So 300k path per ASBR, 300k pfx per ASBR, 1 path per prefix
> per ASBR.
>
> The RR gets 4 x 300k pfx with BGP NH set to ASBR1-2-3-4 loopbacks. And
> send it w/ ADD-PAT to on-site CR
>
> When eBGP session of one of ASBR (say ASBR1) fails, it has to withdraw
> 300k path from RR, and RR need to withdraw 300k path form CR. Untill this
> is done CR will keep sending ¼ oftraffic to ASBR1, and BGP NH == loopback
> is reachable.
>
> Now if ANH is used, CR sees 4 path per prefix with BGP NH == ANH1-2-3-4
> respectively. When eBGP session of one of ASBR (say ASBR1) fails, it
> removes ANH1 form IGP and start to withdraw 300k path from RR, and RR need
> to withdraw 300k path form CR. As soon as CR sees IGP update (ANH1
> unreachable) it can mark all 300k path that have BGP NH == ANH1 unusable.
> And stop forwarding to ASBR1. If CR runs BGP PIC EDGE it could be
> sub-second.
>


All of this works fine today out of the box if you do not set next hop self
on your ASBRs (typical cfg in non MPLS networks :).
And if you do set nhp there as mentioned you can control when it is
redistributed into IGP (or removed from it) by tracking any object (or set
of objects) you define. Very simple.



>  The inter-site operation – advertising only one path w/ BGP NH
> representing “set of eBGP sessions from set of ASBRS” is just one more
> optimization. Let call this SP_ANH (Site-Peer ANH in contrast to above
> discussed ASBR-Peer ANH).
>
>
>    - If RR advertise to other sites only one path and BGP NH is loopback
>    of one of ASBRs (or ASBR-Peer ANH), then what is convergence in case of
>    this ASBR failure?  RR has to send 300k path w/ new BGP NH. Until this is
>    done, remote sites will send traffic somewhere elsw. Not best egress site.
>
> Advertise not one but two paths each coming from different ASBRs.


>
>    -
>    - If RR advertise to other sites only all 4  path and BGP NH is
>    loopback of one of ASBRs (or ASBR-Peer ANH), then IGP update removing this
>    address will allow for quick restoration (as other 3 path are available
>    everywhere). But in multi-path scenario, we just sreated 2-3 level of ECMP
>    structure on remote BGP speakers:
>    prefixà (*list of 4* BGP NH) à each BGP NH à *list of IGP* ECMP
>    neighbours. That costly to manage in S/W and in HW
>
>
If you advertise ANH per ASBR you will still have 4 different next hops so
exactly the case as above.

If you however proposing to advertise all paths from all ASBRs with the
same next hop (in your case AS-WIDE ANH) - brilliant - but how are you
assuring that all EBGP peers send you symmetric routes ? If you get some
EBGP sessions giving you partial BGP reachability for whatever reason - and
you are still using anycast ANH from all 4 ASBRs the packets which IGP
sends towards said ASBRs would be either dropped or looped between ASBRs
till TTL expires as next hop is still ANH so anycast.



>
>    -
>    - If RR advertise to other sites only one path and BGP NH is Site-Peer
>    ANH as in this proposal, then Site-Peer ANH is not removed form IGP (as
>    other ASBRs has session with Peer). Re,mote sites keep sending traffic
>    using pre-failure data until BGP update from RR comes. End when it comes,
>    it will have same BGP NH as pre-failure path. So there will be no need to
>    update FIB. Also FIB structure will be simpler and less costly
>    prefixà one  BGP NH à *list of *IGP ECMP neighbours.
>    Some merchant chips have really limited ECMP capability…
>
> See above.

See when we worked on concept of virtual BGP paths we did a lot of analysis
of this and reached the conclusion that while possible the application of
given abstract or virtual next hop to consistent union of  prefixes
reachable over N EBGP sessions from single or multiple ASBRs must be
automated as you can not assure eBGP symmetry.

So even in your simplest case of ANH per ASBR per PeerAS there is zero
guarantee that you get identical paths from all peers. That means that
since you are going to keep the ANH in IGP till last session goes away you
are going to attract traffic to such ASBRs until BGP withdrawn all affected
non symmetrical paths. Trust me much faster would be not to set nh on ASBR
and just remove peer's address in IGP in one shot. Then BGP can take
however long it takes to "converge" without affecting any data plane.

Many thx,
R.

_______________________________________________
GROW mailing list
GROW@ietf.org
https://www.ietf.org/mailman/listinfo/grow

Re: [GROW] Review request for draft-szarecki-grow-abstract-nh-scaleout-peering-00

Reply via email to