Re: [Lsr] Working Group Last Call for "OSPF Strict-Mode for BFD" - draft-ietf-lsr-ospf-bfd-strict-mode-04

Les Ginsberg (ginsberg) Mon, 31 Jan 2022 11:48:20 -0800

Jeff –

I appreciate that you have been pulled into reading a very lengthy thread and 
then commenting  on it – which is a difficult/time consuming  thing to do 
accurately.
And I certainly welcome your input and agree with your input.


I have not asked for BFD extensions.
I have stated that “IF” additional functionality is required from BFD that the 
proper place to discuss that is in the BFD WG – and such discussions are 
definitely not in scope of this draft.

The main content of this lengthy thread is Robert asking for additional 
specification in this draft and other folks (myself, Albert, Ketan) saying it 
doesn’t belong in this draft. Which is why I agree with everything you say 
below except for your perception that you are agreeing with Robert. You are 
actually agreeing with myself, Albert, Ketan. 😊

Thanx for your participation.

    Les

From: Jeffrey Haas <jh...@pfrc.org>
Sent: Monday, January 31, 2022 11:28 AM
To: Robert Raszuk <rob...@raszuk.net>
Cc: Ketan Talaulikar <ketant.i...@gmail.com>; Les Ginsberg (ginsberg) 
<ginsb...@cisco.com>; draft-ietf-lsr-ospf-bfd-strict-m...@ietf.org; Acee Lindem 
(acee) <a...@cisco.com>; Albert Fu <af...@bloomberg.net>; lsr <lsr@ietf.org>
Subject: Re: [Lsr] Working Group Last Call for "OSPF Strict-Mode for BFD" - 
draft-ietf-lsr-ospf-bfd-strict-mode-04

[Note that I read the LSR mailing list infrequently, but this thread was 
brought to my attention.]

I wish to largely support Robert's point here.  BFD is not intended as a link 
quality protocol.  It's a very simple hello protocol that can operate quite 
quickly and provide simple edge transition events of Up and Down.

There has been work in the BFD Working Group over the years to attempt to bring 
more of "link quality" behaviors to the protocol.  One, of interest to this 
thread, is the BFD for Large Packets work, which can support MTU probes as part 
of BFD operation.

draft-ietf-bfd-stability discusses leveraging BFD internal state to help look 
at link instability issues as BFD sees them.

And, of course, Greg Mirsky had several times he wanted to get BFD to do more 
active behaviors.  He was encouraged to leverage the BFD machinery in his own 
non-BFD draft if he found it helpful.  I suspect he'll respond to this thread 
with comments on his thinking here.

That said, the BFD strict work is about removing control-plane protocol 
ambiguity with regards to how it uses BFD and how the state machines interact 
with each other.  I think that work has been reasonably done.

The thing that BFD isn't about in such contexts is being more than a simple 
proxy for the link being of bad enough quality for BFD to go down taking the 
client protocols down with it.  It's important for those client protocols and 
the operators to set the timers and Detection Multiplier (number of lost 
packets) to speeds they think support their needs.  If you have a noisy link 
that can drop several packets in succession and that's what you want to be your 
trigger, BFD is your protocol.  If you want it to take an apparently continuous 
loss over most of a second, BFD can do that too if you tune your timers 
appropriately.

But, as you say Robert, it's not intended to be a general IPPM style tool.  I 
don't believe the BFD strict drafts should try to treat BFD as if it is one.

-- Jeff




On Jan 31, 2022, at 5:31 AM, Robert Raszuk 
<rob...@raszuk.net<mailto:rob...@raszuk.net>> wrote:

HI Ketan & Les,

To finish this topic I would like to observe that IMHO you have it quite 
backwords.

Comment #1

The tone of your expressions is trying to illustrate that there can be many 
clients for given link probing tool (here BFD). In reality the situation is 
vastly different. There is usually one link state IGP running on the node and 
given set of probing protocols are associated with it. Moreover, the world does 
not end on BFD. BFD is just one possible tool, but more and more path probing 
tools are emerging or are already deployed. Asking for each of them to 
introduce into their state machine a new behaviour to delay reporting UP state 
on a per client basis is nothing else then just pushing the problem aways and 
not caring for the cost associated with it.

Comment #2

BFD is a great tool to tell you if the end to end path is UP or DOWN. It was 
not designed to give you any characteristics or metrics for the path quality.

So all assertions of that notion in your draft are simply wrong. While sure 
there are proposals to extend BFD probe packets with arbitrary large payload to 
tell you if at some packet size you can still reach the other end they are 
still nothing close to measure any form of link performance or detect "A 
degraded or poor quality link"

Thx a lot,
R.

On Mon, Jan 31, 2022 at 5:48 AM Ketan Talaulikar 
<ketant.i...@gmail.com<mailto:ketant.i...@gmail.com>> wrote:
Hi Les,

I agree with you that mechanisms like dampening and hold-down are best achieved 
at the lowest levels (in this case in the monitoring protocol like BFD) instead 
of in each routing protocol on the top.

Now whether this means we include/support the signaling of the parameters for 
these mechanisms in BFD or whether they are achieved by provisioning (as done 
currently by some implementations) is best discussed in the BFD WG.

Thanks,
Ketan


On Mon, Jan 31, 2022 at 1:08 AM Les Ginsberg (ginsberg) 
<ginsb...@cisco.com<mailto:ginsb...@cisco.com>> wrote:
Robert –

Here is what you said (emphasis added):

<snip>
But the timer I am suggesting is not related to BFD operation, but to OSPF 
(and/or ISIS). It is not about BFD sessions being UP or DOWN. It is about 
allowing BFD for more testing (with various parameters (for example increasing 
test packet size in some discrete steps) before OSPF is happy to bring the adj. 
up.
<end snip>

Point #1: If you want BFD to do more testing (such as MTU testing) then clearly 
you need extensions to BFD (such as 
https://datatracker.ietf.org/doc/draft-ietf-bfd-large-packets/ )

Point #2: The existing timers (as Ketan points out are mentioned in Section 5) 
are applied today at the OSPF level precisely because OSPF does not currently 
have strict-mode operation. So in a flapping scenario you could see the 
following behavior:

a)BFD goes down
b)OSPF goes down in response to BFD
c)OSPF comes back up
d)Link is still unstable – so traffic is being dropped some of the time – but 
perhaps OSPF adjacency stays up (i.e., OSPF hellos get through often enough to 
keep the OSPF adjacency up)

So some implementations have chosen to insert a delay following “b”. This 
doesn’t guarantee stability, but hopefully makes it less likely. And because 
OSPF today does NOT wait for BFD to come up, the delay has to be implemented at 
the OSPF level.

Once you have strict mode support, the sequence becomes:

a)BFD goes down
b)OSPF goes down in response to BFD
c)BFD comes back up
d)OSPF comes back up

Now, if the concern is that BFD comes back up while the link is still unstable, 
the way to address that is to put a delay either before BFD attempts to bring 
up a new session or a delay after achieving UP state before it signals UP to 
its clients – such as OSPF. This is a better solution because all BFD clients 
benefit from this. Ad if the link is still unstable, it is more likely that the 
BFD session will go down during the delay period than it would be for OSPF 
because the BFD timers are significantly more aggressive.
(BTW, this behavior can be done w/o a BFD protocol extension – it is purely an 
implementation choice.)

From a design perspective, dampening is always best done at the lowest layer 
possible. In most cases, interface layer dampening is best. If that is not 
reliable for some reason, then move one layer up – not two layers up.

   Les


From: Robert Raszuk <rob...@raszuk.net<mailto:rob...@raszuk.net>>
Sent: Sunday, January 30, 2022 10:05 AM
To: Ketan Talaulikar <ketant.i...@gmail.com<mailto:ketant.i...@gmail.com>>
Cc: Les Ginsberg (ginsberg) <ginsb...@cisco.com<mailto:ginsb...@cisco.com>>; 
Acee Lindem (acee) <a...@cisco.com<mailto:a...@cisco.com>>; 
draft-ietf-lsr-ospf-bfd-strict-m...@ietf.org<mailto:draft-ietf-lsr-ospf-bfd-strict-m...@ietf.org>;
 Albert Fu <af...@bloomberg.net<mailto:af...@bloomberg.net>>; lsr 
<lsr@ietf.org<mailto:lsr@ietf.org>>
Subject: Re: [Lsr] Working Group Last Call for "OSPF Strict-Mode for BFD" - 
draft-ietf-lsr-ospf-bfd-strict-mode-04

Hi Ketan,

I would like to point out that the draft discusses the BFD "dampening" or 
"hold-down" mechanism in Sec 5. We are aware of BFD implementations that 
include such mechanisms in a protocol-agnostic manner.

BFD dampening or hold-time are completely orthogonal to my point. Both have 
nothing to do with it.

Those timers only fire when BFD goes down. In my example BFD does not go down. 
But we want to bring up the client adj. only after X ms/sec/min etc ...of 
normal BFD operation if no failure is detected during that timer.

This draft indicates that OSPF adjacency will "advance" in the neighbor FSM 
only after BFD reports UP.

And that is exactly too soon. In fact if you do that today without waiting some 
time (if you retire the current OSPF timer) you will not help at all in the 
case you are trying to address.

Reason being that perhaps 200 ms after BFD UP it will go down, but OSPF adj. 
will get already established. It is really pretty simple.

Thx,
Robert.

PS. And yes I think ISIS should also get fixed in that respect.
_______________________________________________
Lsr mailing list
Lsr@ietf.org<mailto:Lsr@ietf.org>
https://www.ietf.org/mailman/listinfo/lsr

_______________________________________________
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr

Re: [Lsr] Working Group Last Call for "OSPF Strict-Mode for BFD" - draft-ietf-lsr-ospf-bfd-strict-mode-04

Reply via email to