Re: [Lsr] BFD aspects

Aijun Wang Mon, 29 Nov 2021 22:49:39 -0800

Hi, Greg:

I agree with you that BFD has good performance than other failure detection 
mechanism, but we should also consider the scalability of the solutions.

And, can the 10ms be guaranteed for multi-hop BFD in any network? I think there 
is also the number of BFD sessions limit on each device.

Best Regards

Aijun Wang

China Telecom

From: Greg Mirsky <gregimir...@gmail.com> 
Sent: Tuesday, November 30, 2021 12:08 PM
To: Aijun Wang <wangai...@tsinghua.org.cn>
Cc: lsr <lsr@ietf.org>; Gyan Mishra <hayabusa...@gmail.com>; Robert Raszuk 
<rob...@raszuk.net>
Subject: Re: [Lsr] BFD aspects

Hi Aijun,

thank you for clarifying your goal. I have missed asking another question:

What is the required failure detection time?

For example, a 10 ms detection guarantee is required for local protection. And 
that results in a 3.3 ms interval between the fault detection packets (e.g., 
CCM or BFD). As I understand it, IGP is likely to rely on single-hop BFD 
detection. Hence, 10 ms before PE's neighbor discovers the failure. Then the 
IGP processes will start acting. Thus, I don't see how IGP can guarantee 
anything less than 10 ms. Would you agree?

Regards,

Greg

On Mon, Nov 29, 2021 at 7:38 PM Aijun Wang <wangai...@tsinghua.org.cn 
<mailto:wangai...@tsinghua.org.cn> > wrote:

Hi, Greg:

I understand that BFD can get the guaranteed failure detection time than other 
protocol that depends on the size of the network.

What we want to emphasize is that the balance of deployment/operation overhead 
and the efficiency of the proposed solutions.

For your questions, I think we can still get the millisecond failure detection 
time via the IGP itself(Far faster than the BGP hello timer for BGP use case; 
and also benefit for the tunnel services that has no hello timer).

The actual time should certainly be verified later in simulation environment or 
in real network deployment.

Best Regards

Aijun Wang

China Telecom

From: Greg Mirsky <gregimir...@gmail.com <mailto:gregimir...@gmail.com> > 
Sent: Tuesday, November 30, 2021 11:11 AM
To: Aijun Wang <wangai...@tsinghua.org.cn <mailto:wangai...@tsinghua.org.cn> >
Cc: lsr <lsr@ietf.org <mailto:lsr@ietf.org> >; Gyan Mishra 
<hayabusa...@gmail.com <mailto:hayabusa...@gmail.com> >; Robert Raszuk 
<rob...@raszuk.net <mailto:rob...@raszuk.net> >
Subject: Re: [Lsr] BFD aspects

Hi Aijun,

what is the guaranteed failure detection time for the IGP-based solution?

Regards,

Greg

On Mon, Nov 29, 2021 at 7:07 PM Aijun Wang <wangai...@tsinghua.org.cn 
<mailto:wangai...@tsinghua.org.cn> > wrote:

Hi, Greg:

Even the BFD auto-configuration extensions has been standardized and 
implemented, won’t the network be filled with the detect packets, instead of 
the user packets?

For PUA/PULSE solution, the mentioned LSA will only be emerged when the node 
status change from “UP” to “DOWN”, but the BFD packet will be sent continuously 
when these PEs are active. 

Which one is efficient?

Certainly, we will consider the massive failure situations, even it will occur 
in very rare circumstances.

Best Regards

Aijun Wang

China Telecom

From: Greg Mirsky <gregimir...@gmail.com <mailto:gregimir...@gmail.com> > 
Sent: Tuesday, November 30, 2021 10:47 AM
To: Aijun Wang <wangai...@tsinghua.org.cn <mailto:wangai...@tsinghua.org.cn> >
Cc: lsr <lsr@ietf.org <mailto:lsr@ietf.org> >; Gyan Mishra 
<hayabusa...@gmail.com <mailto:hayabusa...@gmail.com> >; Robert Raszuk 
<rob...@raszuk.net <mailto:rob...@raszuk.net> >
Subject: Re: [Lsr] BFD aspects

Hi Aijun,

thank you for confirming that it is not the conclusion one can arrive based on 
my discussion with Robert. Secondly, the problem you describe, I wouldn't 
characterize as a scaling issue with using multi-hop BFD monitoring path 
continuity in the underlay network. In my opinion, it is an operational 
overhead that can be addressed by an intelligent management plane or a few 
extensions in the control plane that is setting an overlay. Since the 
management plane is usually a proprietary solution, I invite anyone interested 
in working on BFD auto-configuration extensions in the control plane. I much 
appreciate references to the use cases that can benefit from such extensions.

Regards,

Greg

On Mon, Nov 29, 2021 at 6:26 PM Aijun Wang <wangai...@tsinghua.org.cn 
<mailto:wangai...@tsinghua.org.cn> > wrote:

Hi, Greg:

Firstly, regardless of which methods to be used for the multihop BFD approach, 
it is certainly the configuration overhead if you image there are 10,000 PEs as 
Tony often raised as one example. 

Shouldn’t you configure each pair of them to detect the PE-PE connection?

It is obvious not scalable.

Best Regards

Aijun Wang

China Telecom

From: Greg Mirsky <gregimir...@gmail.com <mailto:gregimir...@gmail.com> > 
Sent: Tuesday, November 30, 2021 10:18 AM
To: Aijun Wang <wangai...@tsinghua.org.cn <mailto:wangai...@tsinghua.org.cn> >
Cc: Gyan Mishra <hayabusa...@gmail.com <mailto:hayabusa...@gmail.com> >; Robert 
Raszuk <rob...@raszuk.net <mailto:rob...@raszuk.net> >; lsr <lsr@ietf.org 
<mailto:lsr@ietf.org> >
Subject: Re: [Lsr] BFD aspects

Hi Aijun,

could you please elaborate on how you see that this discussion leads to the 
"BFD based detection for the mentioned problem is not [...] scalable(among 
PEs)" conclusion? I hope that there's nothing I've said or suggested lead you 
to this conclusion. Personally, I believe that BFD-based PE-PE is the best 
technical solution. I understand that an operator may be dissatisfied with the 
additional configuration of the BFD session. As noted, I believe that can be 
addressed in the management plane or minor extensions in the control plane (BGP 
or not). If a particular implementation (or a combination of the implementation 
and HW) has a scaling challenge with multi-hop BFD, then that could be not 
enough sufficient technical justification for a somewhat controversial proposal.

Regards,

Greg

On Mon, Nov 29, 2021 at 5:17 PM Aijun Wang <wangai...@tsinghua.org.cn 
<mailto:wangai...@tsinghua.org.cn> > wrote:

>From the discussion, I think we can get the conclusion that BFD based 
>detection for the mentioned problem is not reliable (between PE/RR) and 
>scalable(among PEs).

Then also the BGP based solution.

So let’s focus how to implement it within the IGP?  Thanks Greg’s analysis.

And one supplement for Robert’s comments: RR is always not located within the 
same area as PEs, then can’t know the down of PE nodes immediately when the 
summary is configured between areas.

Best Regards

Aijun Wang

China Telecom

From: lsr-boun...@ietf.org <mailto:lsr-boun...@ietf.org>  <lsr-boun...@ietf.org 
<mailto:lsr-boun...@ietf.org> > On Behalf Of Gyan Mishra
Sent: Tuesday, November 30, 2021 8:44 AM
To: Robert Raszuk <rob...@raszuk.net <mailto:rob...@raszuk.net> >
Cc: Greg Mirsky <gregimir...@gmail.com <mailto:gregimir...@gmail.com> >; lsr 
<lsr@ietf.org <mailto:lsr@ietf.org> >
Subject: Re: [Lsr] BFD aspects

Robert 

On Mon, Nov 29, 2021 at 7:35 PM Robert Raszuk <rob...@raszuk.net 
<mailto:rob...@raszuk.net> > wrote:

Hi Greg,

If BFD would have autodiscovery built in, that would indeed be the ultimate 
solution. Of course folks will worry about scaling and number of BFD sessions 
to be run PE-PE. 

GIM>> I sense that it is not "BFD autodiscovery" but an advertisement of BFD 
multi-hop system readiness to the particular PE. That, as I think of it, can be 
done in a control or management plane.

Agreed. 

But if BFD between all PEs would be an option why RR to PE in the local area 
would not be a viable solution ? 

GIM>>Because, in the case of PE-PE, BFD control packets will be fate-sharing 
with data packets. But the path between RR and PE might not be used for 
carrying data packets at all.

100%. But that was accounted for. Reason being that you have at least two RRs 
in an area. The point of BFD was to use detect that PE went down. 

Gyan> What Greg is alluding is a very good point to consider is that the RR in 
many cases in operator networks sit in the “control plane” path which is 
separate from the data plane path.  So the E2E forwarding plane path between 
the PEs, the RR has no knowledge as is it sits outside the forwarding plane 
path.  That being said the PE to RR path is disjoint from the PE-PE path so 
from the PE-RR  RR POV may think the PE is up or down thus the false positive 
or negative. That would be the case regardless of how many RRs are deployed.

You are absolutely right that it may report RR disconnect from the network 
while PE is up and data plane from remote PEs can reach it. That is why we have 
more than one RR. 

As far as fate sharing PE-PE BFD with real user data - I think it is not always 
the case. But this is completely separate discussion :) 

Also please keep in mind that PE going down can be learned by RRs by listening 
to the IGP. No BFD needed. 

Both would be multihop, both would be subject to all transit failures etc ... 

GIM>> I think that there's a difference between the impact a path failure has 
on the data traffic. In the case of monitoring PE-PE path in the underlay and 
using the same encapsulation as data traffic is representative of the data 
experience. A failure of the PE-RR path, in my understanding, may be not 
representative at all. BFD session between RR and PE may fail while PE is 
absolutely functional from the service PoV. 

Please keep in mind that this entire discussion is not about data plane failure 
end to end :)  Yes, it's pretty sad. This entire debate  is to indicate domain 
wide that the IGP component on a PE went down. 

No one considers data plane liveness and even as you observed data plane 
encapsulation congruence. Clearly this is not a true OAM discussion. 

On the other hand, PE might be disconnected from the service while the BFD 
session to RR is in the Up state.

Not likely if you keep in mind that to trigger any remote action such failure 
would have to happen to all RRs. 

Thx a lot,
R.

_______________________________________________
Lsr mailing list
Lsr@ietf.org <mailto:Lsr@ietf.org> 
https://www.ietf.org/mailman/listinfo/lsr

-- 

 <http://www.verizon.com/> 

Gyan Mishra

Network Solutions Architect 

Email gyan.s.mis...@verizon.com <mailto:gyan.s.mis...@verizon.com> 

M 301 502-1347

_______________________________________________
Lsr mailing list
Lsr@ietf.org <mailto:Lsr@ietf.org> 
https://www.ietf.org/mailman/listinfo/lsr

_______________________________________________
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr

Re: [Lsr] BFD aspects

Reply via email to