Hi, Greg:
I agree with you that BFD has good performance than other failure detection mechanism, but we should also consider the scalability of the solutions. And, can the 10ms be guaranteed for multi-hop BFD in any network? I think there is also the number of BFD sessions limit on each device. Best Regards Aijun Wang China Telecom From: Greg Mirsky <gregimir...@gmail.com> Sent: Tuesday, November 30, 2021 12:08 PM To: Aijun Wang <wangai...@tsinghua.org.cn> Cc: lsr <lsr@ietf.org>; Gyan Mishra <hayabusa...@gmail.com>; Robert Raszuk <rob...@raszuk.net> Subject: Re: [Lsr] BFD aspects Hi Aijun, thank you for clarifying your goal. I have missed asking another question: What is the required failure detection time? For example, a 10 ms detection guarantee is required for local protection. And that results in a 3.3 ms interval between the fault detection packets (e.g., CCM or BFD). As I understand it, IGP is likely to rely on single-hop BFD detection. Hence, 10 ms before PE's neighbor discovers the failure. Then the IGP processes will start acting. Thus, I don't see how IGP can guarantee anything less than 10 ms. Would you agree? Regards, Greg On Mon, Nov 29, 2021 at 7:38 PM Aijun Wang <wangai...@tsinghua.org.cn <mailto:wangai...@tsinghua.org.cn> > wrote: Hi, Greg: I understand that BFD can get the guaranteed failure detection time than other protocol that depends on the size of the network. What we want to emphasize is that the balance of deployment/operation overhead and the efficiency of the proposed solutions. For your questions, I think we can still get the millisecond failure detection time via the IGP itself(Far faster than the BGP hello timer for BGP use case; and also benefit for the tunnel services that has no hello timer). The actual time should certainly be verified later in simulation environment or in real network deployment. Best Regards Aijun Wang China Telecom From: Greg Mirsky <gregimir...@gmail.com <mailto:gregimir...@gmail.com> > Sent: Tuesday, November 30, 2021 11:11 AM To: Aijun Wang <wangai...@tsinghua.org.cn <mailto:wangai...@tsinghua.org.cn> > Cc: lsr <lsr@ietf.org <mailto:lsr@ietf.org> >; Gyan Mishra <hayabusa...@gmail.com <mailto:hayabusa...@gmail.com> >; Robert Raszuk <rob...@raszuk.net <mailto:rob...@raszuk.net> > Subject: Re: [Lsr] BFD aspects Hi Aijun, what is the guaranteed failure detection time for the IGP-based solution? Regards, Greg On Mon, Nov 29, 2021 at 7:07 PM Aijun Wang <wangai...@tsinghua.org.cn <mailto:wangai...@tsinghua.org.cn> > wrote: Hi, Greg: Even the BFD auto-configuration extensions has been standardized and implemented, won’t the network be filled with the detect packets, instead of the user packets? For PUA/PULSE solution, the mentioned LSA will only be emerged when the node status change from “UP” to “DOWN”, but the BFD packet will be sent continuously when these PEs are active. Which one is efficient? Certainly, we will consider the massive failure situations, even it will occur in very rare circumstances. Best Regards Aijun Wang China Telecom From: Greg Mirsky <gregimir...@gmail.com <mailto:gregimir...@gmail.com> > Sent: Tuesday, November 30, 2021 10:47 AM To: Aijun Wang <wangai...@tsinghua.org.cn <mailto:wangai...@tsinghua.org.cn> > Cc: lsr <lsr@ietf.org <mailto:lsr@ietf.org> >; Gyan Mishra <hayabusa...@gmail.com <mailto:hayabusa...@gmail.com> >; Robert Raszuk <rob...@raszuk.net <mailto:rob...@raszuk.net> > Subject: Re: [Lsr] BFD aspects Hi Aijun, thank you for confirming that it is not the conclusion one can arrive based on my discussion with Robert. Secondly, the problem you describe, I wouldn't characterize as a scaling issue with using multi-hop BFD monitoring path continuity in the underlay network. In my opinion, it is an operational overhead that can be addressed by an intelligent management plane or a few extensions in the control plane that is setting an overlay. Since the management plane is usually a proprietary solution, I invite anyone interested in working on BFD auto-configuration extensions in the control plane. I much appreciate references to the use cases that can benefit from such extensions. Regards, Greg On Mon, Nov 29, 2021 at 6:26 PM Aijun Wang <wangai...@tsinghua.org.cn <mailto:wangai...@tsinghua.org.cn> > wrote: Hi, Greg: Firstly, regardless of which methods to be used for the multihop BFD approach, it is certainly the configuration overhead if you image there are 10,000 PEs as Tony often raised as one example. Shouldn’t you configure each pair of them to detect the PE-PE connection? It is obvious not scalable. Best Regards Aijun Wang China Telecom From: Greg Mirsky <gregimir...@gmail.com <mailto:gregimir...@gmail.com> > Sent: Tuesday, November 30, 2021 10:18 AM To: Aijun Wang <wangai...@tsinghua.org.cn <mailto:wangai...@tsinghua.org.cn> > Cc: Gyan Mishra <hayabusa...@gmail.com <mailto:hayabusa...@gmail.com> >; Robert Raszuk <rob...@raszuk.net <mailto:rob...@raszuk.net> >; lsr <lsr@ietf.org <mailto:lsr@ietf.org> > Subject: Re: [Lsr] BFD aspects Hi Aijun, could you please elaborate on how you see that this discussion leads to the "BFD based detection for the mentioned problem is not [...] scalable(among PEs)" conclusion? I hope that there's nothing I've said or suggested lead you to this conclusion. Personally, I believe that BFD-based PE-PE is the best technical solution. I understand that an operator may be dissatisfied with the additional configuration of the BFD session. As noted, I believe that can be addressed in the management plane or minor extensions in the control plane (BGP or not). If a particular implementation (or a combination of the implementation and HW) has a scaling challenge with multi-hop BFD, then that could be not enough sufficient technical justification for a somewhat controversial proposal. Regards, Greg On Mon, Nov 29, 2021 at 5:17 PM Aijun Wang <wangai...@tsinghua.org.cn <mailto:wangai...@tsinghua.org.cn> > wrote: >From the discussion, I think we can get the conclusion that BFD based >detection for the mentioned problem is not reliable (between PE/RR) and >scalable(among PEs). Then also the BGP based solution. So let’s focus how to implement it within the IGP? Thanks Greg’s analysis. And one supplement for Robert’s comments: RR is always not located within the same area as PEs, then can’t know the down of PE nodes immediately when the summary is configured between areas. Best Regards Aijun Wang China Telecom From: lsr-boun...@ietf.org <mailto:lsr-boun...@ietf.org> <lsr-boun...@ietf.org <mailto:lsr-boun...@ietf.org> > On Behalf Of Gyan Mishra Sent: Tuesday, November 30, 2021 8:44 AM To: Robert Raszuk <rob...@raszuk.net <mailto:rob...@raszuk.net> > Cc: Greg Mirsky <gregimir...@gmail.com <mailto:gregimir...@gmail.com> >; lsr <lsr@ietf.org <mailto:lsr@ietf.org> > Subject: Re: [Lsr] BFD aspects Robert On Mon, Nov 29, 2021 at 7:35 PM Robert Raszuk <rob...@raszuk.net <mailto:rob...@raszuk.net> > wrote: Hi Greg, If BFD would have autodiscovery built in, that would indeed be the ultimate solution. Of course folks will worry about scaling and number of BFD sessions to be run PE-PE. GIM>> I sense that it is not "BFD autodiscovery" but an advertisement of BFD multi-hop system readiness to the particular PE. That, as I think of it, can be done in a control or management plane. Agreed. But if BFD between all PEs would be an option why RR to PE in the local area would not be a viable solution ? GIM>>Because, in the case of PE-PE, BFD control packets will be fate-sharing with data packets. But the path between RR and PE might not be used for carrying data packets at all. 100%. But that was accounted for. Reason being that you have at least two RRs in an area. The point of BFD was to use detect that PE went down. Gyan> What Greg is alluding is a very good point to consider is that the RR in many cases in operator networks sit in the “control plane” path which is separate from the data plane path. So the E2E forwarding plane path between the PEs, the RR has no knowledge as is it sits outside the forwarding plane path. That being said the PE to RR path is disjoint from the PE-PE path so from the PE-RR RR POV may think the PE is up or down thus the false positive or negative. That would be the case regardless of how many RRs are deployed. You are absolutely right that it may report RR disconnect from the network while PE is up and data plane from remote PEs can reach it. That is why we have more than one RR. As far as fate sharing PE-PE BFD with real user data - I think it is not always the case. But this is completely separate discussion :) Also please keep in mind that PE going down can be learned by RRs by listening to the IGP. No BFD needed. Both would be multihop, both would be subject to all transit failures etc ... GIM>> I think that there's a difference between the impact a path failure has on the data traffic. In the case of monitoring PE-PE path in the underlay and using the same encapsulation as data traffic is representative of the data experience. A failure of the PE-RR path, in my understanding, may be not representative at all. BFD session between RR and PE may fail while PE is absolutely functional from the service PoV. Please keep in mind that this entire discussion is not about data plane failure end to end :) Yes, it's pretty sad. This entire debate is to indicate domain wide that the IGP component on a PE went down. No one considers data plane liveness and even as you observed data plane encapsulation congruence. Clearly this is not a true OAM discussion. On the other hand, PE might be disconnected from the service while the BFD session to RR is in the Up state. Not likely if you keep in mind that to trigger any remote action such failure would have to happen to all RRs. Thx a lot, R. _______________________________________________ Lsr mailing list Lsr@ietf.org <mailto:Lsr@ietf.org> https://www.ietf.org/mailman/listinfo/lsr -- <http://www.verizon.com/> Gyan Mishra Network Solutions Architect Email gyan.s.mis...@verizon.com <mailto:gyan.s.mis...@verizon.com> M 301 502-1347 _______________________________________________ Lsr mailing list Lsr@ietf.org <mailto:Lsr@ietf.org> https://www.ietf.org/mailman/listinfo/lsr
_______________________________________________ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr