Hi Peter, All,

From a BGP perspective (PE service nodes) the event detection when transport 
tunnel end-point suddenly becomes unreachable is an operational problem. I 
think we all agree.
This problem exists in any multi-domain network, and is not limited to a 
multi-area/level IGP with summarization. Hence my doubts that simple encodings 
using the IGP as API for unreachability signaling is an optimal solution.  

Churning the LSDB for these things doesn't seem right.  Would this mean that we 
hack the IGP implementation so we don't trigger SPFs on rx of these updates?  
Another concern is how we hook into BGP sideways to update it. Typically a 
router just looks at RTM and tunnel-tables for reachability. Now it would have 
check all the time a separate bypass-list.  
What about the pseudo-state. On startup I would imagine we would have to 
originate this PUA until a certain point?

Some consideration about installing the PUA route as a blackhole route, it does 
not seem an option because resolution of BGP next-hops with blackhole /32 
routes has to continue to mean “drop” matching traffic because of the 
widespread way this is used for DDOS protection. So there is need another 
“install” type for the “unreachable” IGP prefix which does not exist yet.

To make IGP based Prefix-unreachability-signal successful seems not a trivial 
task pe-to-pe, and involves more than simplistic dumping of opaque link-state 
messages into IGP and to re-vector interior routing as an API. I'm a bit 
tormented regarding the potential evil caused to IGP for signaling 
prefix-unreachability. It may not be worth the effort. Especially when 
realizing that the problem space is not limited to multi-area/level 
summarization but instead exists in any multi-domain network. 

Maybe IETF should consider looking at the bigger picture, at service level, and 
document a full service level solution framework instead of looking only at IGP 
in atomic fashion.

G/

-----Original Message-----
From: Peter Psenak <ppse...@cisco.com> 
Sent: Tuesday, June 14, 2022 5:46 PM
To: Van De Velde, Gunter (Nokia - BE/Antwerp) <gunter.van_de_ve...@nokia.com>; 
lsr <lsr@ietf.org>
Cc: draft-ppsenak-lsr-igp-ureach-prefix-annou...@ietf.org; 
draft-wang-lsr-prefix-unreachable-annoucement 
<draft-wang-lsr-prefix-unreachable-annoucem...@ietf.org>
Subject: Re: Thoughts about PUAs - are we not over-engineering?

Hi Gunter,

please see inline:

On 14/06/2022 10:59, Van De Velde, Gunter (Nokia - BE/Antwerp) wrote:
> Hi All,
> 
> When reading both proposals about PUA's:
> * draft-ppsenak-lsr-igp-ureach-prefix-announce-00
> * draft-wang-lsr-prefix-unreachable-annoucement-09
> 
> The identified problem space seems a correct observation, and indeed 
> summaries hide remote area network instabilities. It is one of the perceived 
> benefits of using summaries. The place in the network where this hiding takes 
> the most impact upon convergence is at service nodes (PE's for 
> L3/L2/transport) where due to the summarization its difficult to detect that 
> the transport tunnel end-point suddenly becomes unreachable. My concern 
> however is if it really is a problem that is worthy for LSR WG to solve.

the request to address the problem is coming from the field. The scale of the 
networks in the field is growing significantly and the summarization is being 
implemented to keep the prefix scale under control.


> 
> To me the "draft draft-wang-lsr-prefix-unreachable-annoucement-09" is 
> not a preferred solution due to the expectation that all nodes in an 
> area must be upgraded to support the IGP capability. From this 
> operational perspective the draft 
> "draft-ppsenak-lsr-igp-ureach-prefix-announce-00" is more elegant, as 
> only the A(S)BR's and particular PEs must be upgraded to support 
> PUA's. I do have concerns about the number of PUA advertisements in 
> hierarchically summarized networks (/24 (site) -> /20 (region) -> /16 
> (core)). More specific, in the /16 backbone area, how many of these 
> PUAs will be floating around creating LSP LSDB update churns? How to 
> control the potentially exponential number of observed PUAs from 
> floating everywhere? (will this lead to OSPF type NSSA areas where 
> areas will be purged from these PUAs for scaling stability?)

Node going down is a rare event. The expected number of UPAs at any given time 
is very small. Implementations can limit the number of UPAs on ABR/ASBR in case 
of a catastrophic events, in which case the UPAs would hardly help anyway.

> 
> Long story short, should we not take a step back and re-think this identified 
> problem space? Is the proposed solution space not more evil as the problem 
> space? We do summarization because it brings stability and reduce the number 
> of link state updates within an area. And now with PUA we re-introduce 
> additional link state updates (PUAs), we blow up the LSDB with information 
> opaque to SPF best-path calculation. In addition there is suggestion of new 
> state-machinery to track the igp reachability of 'protected' prefixes and 
> there is maybe desire to contain or filter updates cross inter-area 
> boundaries. And finally, how will we represent and track PUA in the RTM?

the problem space is valid, as conformed by the field. As described 
above, the number of UPAs will be low, so there is no danger of 
defeating the purpose of the summarization.

> 
> What is wrong with simply not doing summaries and forget about these PUAs to 
> pinch holes in the summary prefixes? this worked very well during last two 
> decennia. Are we not over-engineering with PUAs?

it's the scale of the current networks, which is growing exponentially, 
which demands the use of the summarization.


thanks,
Peter

> 
> G/
> 

_______________________________________________
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr

Reply via email to