Greg –

Inline.

From: Greg Mirsky <gregimir...@gmail.com>
Sent: Monday, January 10, 2022 3:36 PM
To: Les Ginsberg (ginsberg) <ginsb...@cisco.com>
Cc: Tony Li <tony...@tony.li>; Christian Hopps <cho...@chopps.org>; Robert 
Raszuk <rob...@raszuk.net>; Aijun Wang <wangai...@tsinghua.org.cn>; Shraddha 
Hegde <shrad...@juniper.net>; Hannes Gredler <han...@gredler.at>; lsr 
<lsr@ietf.org>; Peter Psenak (ppsenak) <ppse...@cisco.com>
Subject: Re: [Lsr] BGP vs PUA/PULSE

Hi Les,
thank you for the detailed clarifications. Please find my follow-up notes 
in-lined below under the GIM>> tag.

Regards,
Greg

On Mon, Jan 10, 2022 at 3:19 PM Les Ginsberg (ginsberg) 
<ginsb...@cisco.com<mailto:ginsb...@cisco.com>> wrote:
Greg –

The obvious issue is scale. Since you need a full mesh you are talking about 
N**2 behavior – so it doesn’t take many nodes to require thousands of BFD 
sessions.
GIM>> If I understand the scenario correctly, N represents the number of PEs, 
not the number of routers in ASes. If that is the case, what could be a good 
estimate for N?

[LES:] Even a modest sized N = 100 (which is certainly not a high number) leads 
to 10000 BFD sessions. N= 500 => 250,000 sessions. Etc.

In terms of detect time, we are trying to get an order of magnitude improvement 
from normal BGP session timers – so we are aiming for a modest number of 
seconds.
GIM>> That is very helpful information, thank you. Then, we can expect that a 
one-second interval for the transmission of a BFD Control packet would be 
acceptable and guarantee failure detection within three seconds. If that is the 
case, I'll note that many platforms support thousands of BFD sessions at 3.3 
msec intervals. It appears to me that the case we're discussing 
produces/processes 330 times fewer BFD packets per session. Should somewhat 
help with the scaling, would you agree?

[LES:] Nodes which can support thousands of BFD sessions are likely already 
using many BFD sessions for other purposes. In particular, fast detection of 
local failures is always going to be a priority – so if a node has thousands of 
neighbors – it will likely have thousands of single hop BFD sessions. Not to 
mention the plethora of other OAM uses cases being defined. And the 
network-wide traffic impact as these new BFD sessions are largely multi-hop. 
Are you really arguing that the introduction of many thousands of BFD sessions 
is something we should not be concerned about?
   Les

   Les


From: Greg Mirsky <gregimir...@gmail.com<mailto:gregimir...@gmail.com>>
Sent: Monday, January 10, 2022 1:30 PM
To: Les Ginsberg (ginsberg) <ginsb...@cisco.com<mailto:ginsb...@cisco.com>>
Cc: Tony Li <tony...@tony.li<mailto:tony...@tony.li>>; Christian Hopps 
<cho...@chopps.org<mailto:cho...@chopps.org>>; Robert Raszuk 
<rob...@raszuk.net<mailto:rob...@raszuk.net>>; Aijun Wang 
<wangai...@tsinghua.org.cn<mailto:wangai...@tsinghua.org.cn>>; Shraddha Hegde 
<shrad...@juniper.net<mailto:shrad...@juniper.net>>; Hannes Gredler 
<han...@gredler.at<mailto:han...@gredler.at>>; lsr 
<lsr@ietf.org<mailto:lsr@ietf.org>>; Peter Psenak (ppsenak) 
<ppse...@cisco.com<mailto:ppse...@cisco.com>>
Subject: Re: [Lsr] BGP vs PUA/PULSE

Hi Les,
thank you for bringing the real-life scenarios to the discussion. In your 
opinion, what prevents an operator from monitoring a remote PE using a 
multi-hop BFD? Do you have an estimated number of such sessions each PE must 
handle? What could be the required guaranteed failure detection time?

Best regards,
Greg

On Mon, Jan 10, 2022 at 1:08 PM Les Ginsberg (ginsberg) 
<ginsberg=40cisco....@dmarc.ietf.org<mailto:40cisco....@dmarc.ietf.org>> wrote:
Chris/Tony –

We have received requests from real customers who both need to summarize AND 
would like better response time to loss of reachability to individual nodes.
If they could operate at the necessary scale without summarizing they would 
have already – so telling customers to simply make sure they don’t use 
summaries isn’t helpful.

There are then two ways to respond:

1)Sorry, when you use summaries you lose the ability to receive state 
information about individual prefixes covered by the summary. There is nothing 
we can do to help you.

This seems to be what the two of you are saying.

2)We can provide a way to improve response time for the loss of reachability to 
individual destinations covered by a summary, but its use will be limited to 
isolated failures. Failures which affect a significant number of destinations 
at the same time will realize no benefit from the solution. If this limitation 
is acceptable then we have proposals that we think will be useful.

That’s what we are trying to do.

   Les



From: Tony Li <tony1ath...@gmail.com<mailto:tony1ath...@gmail.com>> On Behalf 
Of Tony Li
Sent: Monday, January 3, 2022 1:09 PM
To: Christian Hopps <cho...@chopps.org<mailto:cho...@chopps.org>>
Cc: Peter Psenak (ppsenak) <ppse...@cisco.com<mailto:ppse...@cisco.com>>; Les 
Ginsberg (ginsberg) <ginsb...@cisco.com<mailto:ginsb...@cisco.com>>; Robert 
Raszuk <rob...@raszuk.net<mailto:rob...@raszuk.net>>; Shraddha Hegde 
<shrad...@juniper.net<mailto:shrad...@juniper.net>>; Aijun Wang 
<wangai...@tsinghua.org.cn<mailto:wangai...@tsinghua.org.cn>>; Hannes Gredler 
<han...@gredler.at<mailto:han...@gredler.at>>; lsr 
<lsr@ietf.org<mailto:lsr@ietf.org>>
Subject: Re: [Lsr] BGP vs PUA/PULSE



On Jan 3, 2022, at 11:23 AM, Christian Hopps 
<cho...@chopps.org<mailto:cho...@chopps.org>> wrote:

And I'm saying if a prefix is important enough to merit a bunch of new protocol 
extensions and state, then it's important enough to simply be left out of the 
summarization in the first place.

And then people get what they want, w/o protocol changes/upgrades, and it's 
using time tested and hardened IGP code and designs.


+1

T

_______________________________________________
Lsr mailing list
Lsr@ietf.org<mailto:Lsr@ietf.org>
https://www.ietf.org/mailman/listinfo/lsr
_______________________________________________
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr

Reply via email to