Agreeing with T. Li here (i.e. BFD next-hops) and let me add that AFAIS the
confusion here is that a presence of a /32 route is used as SSAP liveliness
AFAIS and that's simply not what IGPs are here for if you consider their
main job to be fastest possible convergence in network _reachability_ only
and not signalling of service failures. BGP is the overlay synchronizing
SSAPs & scales marvelously @ that. Having BGP next-hop (which is basically
equivalent to all services provided behind it) liveliness indicating the
health of services behind it is the scalable solution IMO, and not starting
to try to teach IGP fragile signalling or PUAM (which BTW AFAIS will
neither scale nor work on generic graphs due to lack of any consistent
algebra I could detect in the draft and it is definitely nothing "like
rift" as the preso seems to claim again) which will easily affect its main
job. For signalling I see how putting it into a service instance is a
somewhat palatable design choice and it's kind of like inventing "passive
BFD" over flooding in my eyes ;-) And BTW, in topologically sorted graphs
(CLOS being the ones of interest these days) with strict positive/negative
disaggregation algebra with minimal blast radius on failures we can scale
to (at least) 0.5M prefixes implementation wise IME and that should allow
us really, really big IP fabrics with leaves holding nothing but defaults
under normal conditions but it's still not a good idea to abuse that for
SSAP synchronization AFAIS (and observe that to scale RIFT does NOT notify
leaves of their vice-versa reachability, it simply prevents blackholing on
aggregates and will produce an ICMP unreachable if there are no routes left
to destination, if you run BFD on top of that as Tony suggests, this will
of course give you the desired effect, for RR you'll run into the TCP
session problem again but maybe you can BFD the RR session and then
propagate that as Robert seems to suggest, the third-party next-hop raises
its head again ;-).

Alternately resolving BGP over BGP as Robert suggests (if I read that
correctly) and use RR to scale out the SSAP nhop availability is possible I
think architecturally without garbage-canning IGPs as "network-wide fast
broadcast mechanism" ... I doubt it will do "couple millisecs" convergence
;-) but can be simpler hardware wise than trying to scale up BFD to large
number of very fast sessions.

-- tony



On Thu, Nov 18, 2021 at 5:06 PM Tony Li <tony...@tony.li> wrote:

>
> Les,
>
> Why would we then punch holes in the summary for member routers?  Just
> because we can?
> *[LES:] No. We are doing it to improve convergence AND retain scalability.*
>
>
>
> You are not improving convergence. You are propagating liveness.  The fact
> that this relates to convergence in the overlay is irrelevant to the IGP.
>
> You are not retaining scalability. You are damaging it. You are proposing
> flooding a prefix per router that fails. If there is a mass failure, that
> would result in flooding a large number of prefixes. The last thing you
> want when there is a mass failure is additional load, exacerbating the
> situation.
>
>
>  Should we corrupt the architecture just because we can?  There are other,
> architecturally appropriate solutions available.  How about we just use
> them?
>
> *[LES:] What are you proposing?*
>
>
>
> You are signaling the (lack of) liveness of a remote node. I propose that
> we instead use existing signaling mechanisms to do this. Multi-hop BFD
> seems like an obvious choice.
>
> If you greatly dislike that for some reason, I would suggest that we
> create a proxy liveness service, advertised by the ABR. This would allow
> correspondents to register for notifications. The ABR could signal these
> unicast when it determines that the specific targets are unreachable.
>
> Tony
>
>
>
_______________________________________________
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr

Reply via email to