proxy purges was one of the worst ideas in IGP operationally speaking for people dealing with this stuff in real networks for last 20+ years and still is. Let's not go there
-- tony On Mon, Jul 15, 2024 at 1:26 PM Aijun Wang <wangai...@tsinghua.org.cn> wrote: > Hi, Acee: > > If you think all of the solutions are not perfect, can we find other > solutions, such as assigning/selecting in advance one proxy router for the > possible disrupt router(from the neighbors of the disrupt router), to > assist to purge the stale LSAs within the network on behalf of the disrupt > router when it goes down? > Doing so can quickly purge the stale LSAs of the disrupt router and > needn’t wait to the end of its start up. > > Aijun Wang > China Telecom > > On Jul 13, 2024, at 04:01, Acee Lindem <acee.i...@gmail.com> wrote: > > Les - > > The SA bit solution is no more “complete" than the database exchange > solution. Let’s talk specific scenarios rather than FUD. > > So we have an LSA originated by the restarting router at time T0 and one > originated by its neighbor at time T1 where T1 is after T0. Although, they > take the same flooding path, the one originated at T1 arrives and is > processed ahead of the one originated at T0 resulting in traffic loss. > > I’m not arguing that this hypothetical situation isn’t possible with > packet loss. However, other than the added overhead and inefficiency of the > SA bit signaling resulting in some small delay, how does the SA bit > solution solve this? How does the restarting router know when its updated > LSAs have successfully been installed on all the routers in the area? It > certainly doesn’t know any better than its neighbor. > > Thanks, > Acee > P.S. One could add a small delay to the database exchange solution once > the last stale LSA is updated or purged but I don’t believe this is > necessary. > > > > On Jul 12, 2024, at 14:48, Les Ginsberg (ginsberg) <ginsb...@cisco.com> > wrote: > > Tony – > > > > What is important to me here is a common understanding and providing a > complete solution. > > > > Hopefully, you are at least understanding that the point I am making is > valid i.e., traffic loss can occur even with better-idbx in place. > > I would also argue that you are underestimating the effect of scale. > > > > As to your argument below, it could also be used to argue against doing > anything – after all we know that current OSPF does converge in a modest > amount of time. > > > > Since you have decided to make things better (which I support) I do not > see why we should not define a complete solution. > > If you, as a vendor, choose not to implement SA because you consider the > cost/benefit ratio unappealing – that is your choice. So long as you and > your customers are satisfied … > > > > But our mission here is to define a solution – and I am simply arguing for > a more complete solution. > > > > Les > > > > *From:* Tony Przygienda <tonysi...@gmail.com> > *Sent:* Friday, July 12, 2024 11:23 AM > *To:* Acee Lindem <acee.i...@gmail.com> > *Cc:* Les Ginsberg (ginsberg) <ginsb...@cisco.com>; Liyan Gong < > gongli...@chinamobile.com>; Aijun Wang <wangai...@tsinghua.org.cn>; Peter > Psenak (ppsenak) <ppse...@cisco.com>; Yingzhen Qu <yingzhen.i...@gmail.com>; > lsr <lsr@ietf.org>; lsr-chairs <lsr-cha...@ietf.org>; shraddha < > shrad...@juniper.net> > *Subject:* Re: [Lsr] About Premature aging of LSA and Purge LSA > > > > Les, whatever you try to suggest here, you slide into direction of trying > to guarantee common knowledge closure (that's the technical term for what > you try) and based on distributed systems theory you end up ultimately with > virtual clock synchronization of the network in some form if you _really_ > want to solve the problem rather than "hey, my stuff may work 2 hops away > rather than 1 hop so it's much better and let's not talk about 3 hops" > (look up at Lampert's clock vectors/matrices for proper theoretical > underpinnings of such undertakings if you'd like to take this discussion > further) and this will slow things to a crawl. Worse, you will discover > pretty soon that going down this path you will have to learn consistent > cuts and basically transaction scheduling most likley ;-) > > > > IGPs are just IGPs, i.e. they do guarantee "eventual consistency" (in > proper technical terms epsilon consistency) and that makes them fast and > reacting fast to failures and that's the base of their success. This also > means you have transients and this here is just one, relatively simple fix > of a local transient and that's about the best you can do to preserve the > desirable properties (i.e. fastest possible eventual consistency with > maximum resiliency [that's the CAP paradigm part which is another way to > see IGPs as _AP type of solution]). > > > > Without this kind of underlying understanding/language we are talking > about "me likes my stuff with me bells and whistles better than ye' thing" > and it's going in circles AFAIS. > > > > so I'm with acee here in short (and I left the fact out that as I say, > flavor of this stuff is deployed since long time and works fine at any > scale in our experience and it's damn' simple to implement comparatively > speaking and doesn't need any big rollouts on the network] compared to all > the signalling machinery suggested) > > > > -- tony > > > > On Fri, Jul 12, 2024 at 6:44 PM Acee Lindem <acee.i...@gmail.com> wrote: > > So, I don’t think the case you are suggesting is plausible. Let’s say you > have a hypothetical router somewhere in the same area that has the > restarting router’s stale LSAs. > > > > 1. The restarting router’s neighbors will only advertise an adjacency > once the stale LSAs have been updated or purged from their local databases. > > 2. Only then will the adjacency be advertised - so the update or purge > precedes the adjacency advertisement. > > 3. How is the neighbor router’s LSA going to pass the restarting > router’s LSA update or purge? It will take the same or possibly even better > flooding path. Will it be flooded at warp speed? > > 4. Are you suggesting that the restarting router’s LSAs are dropped > but the neighbor’s advertisement is not? If so, how would the restarting > router know this and delay removing the adjacency suppression? Are you > relying on the inherent inefficiencies and convergence delays with LLS > signaling handshake between the two routers? > > > > In any case, trying to prevent transient problems due to selective loss of > updates is an exercise in futility. > > > > Thanks, > > Acee > > > > > > > > On Jul 12, 2024, at 12:13, Les Ginsberg (ginsberg) <ginsb...@cisco.com> > wrote: > > > > Acee – > > > > When the restarting router goes down, the state of the LSDB in the network > becomes: > > > > Restart Router LSA: All neighbors advertised > > Neighbor Routers: Neighbor to Restarting Router is removed > > > > When the restarting router comes back up, two changes will occur: > > > > 1)Restarting Router updates its LSAs > > 2)Neighbors updates their LSAs to indicate it once again has a neighbor to > the restarting router > > > > You cannot guarantee the flooding order of network-wide. > > Because the stale LSAs from the Restarting Router are present in all > nodes, as soon as a neighbor readvertises the adjacency to the restarting > router, it is now possible that on some nodes in the network you will > temporarily have an LSDB which has: > > > > Stale LSA from restarting router + Updated LSA from neighbor > > > > Whether the restarting router sends an updated LSA with neighbors or > without neighbors (as you suggest) you cannot prevent the above transient > condition from occurring because doing so requires guaranteeing that the > update to the Neighbor LSA and the update to the restarting router LSA are > done atomically network-wide. > > That is why the restarting router cannot do this without help from the > neighbors. > > > > Hope this is clear. > > > > Les > > > > > > *From:* Acee Lindem <acee.i...@gmail.com> > *Sent:* Friday, July 12, 2024 7:55 AM > *To:* Les Ginsberg (ginsberg) <ginsb...@cisco.com> > *Cc:* Liyan Gong <gongli...@chinamobile.com>; Aijun Wang < > wangai...@tsinghua.org.cn>; Peter Psenak (ppsenak) <ppse...@cisco.com>; > Yingzhen Qu <yingzhen.i...@gmail.com>; lsr <lsr@ietf.org>; lsr-chairs < > lsr-cha...@ietf.org>; tony Przygienda <tonysi...@gmail.com>; shraddha < > shrad...@juniper.net> > *Subject:* Re: [Lsr] About Premature aging of LSA and Purge LSA > > > > > > > > On Jul 12, 2024, at 10:49, Les Ginsberg (ginsberg) <ginsb...@cisco.com> > wrote: > > > > Acee – > > > > The neighbors do not control when the flooding of the purge/update reaches > all routers in the network. > > The neighbors have direct control of the exchange between themselves and > their immediate neighbors – nothing else. > > > > The restarting router has no better idea. If you’re suggesting suppressing > advertising adjacencies until all neighbors of the restarting router are > adjacent (which is a bad idea), the restarting router can do this as well > by suppressing its link advertisements. There is NOTHING additional that > can be accomplished by adding LLS signaling. > > > > Acee > > > > > > > > > > > > Les > > > > *From:* Acee Lindem <acee.i...@gmail.com> > *Sent:* Friday, July 12, 2024 7:44 AM > *To:* Les Ginsberg (ginsberg) <ginsb...@cisco.com> > *Cc:* Liyan Gong <gongli...@chinamobile.com>; Aijun Wang < > wangai...@tsinghua.org.cn>; Peter Psenak (ppsenak) <ppse...@cisco.com>; > Yingzhen Qu <yingzhen.i...@gmail.com>; lsr <lsr@ietf.org>; lsr-chairs < > lsr-cha...@ietf.org>; tony Przygienda <tonysi...@gmail.com>; shraddha < > shrad...@juniper.net> > *Subject:* Re: [Lsr] About Premature aging of LSA and Purge LSA > > > > > > > > On Jul 12, 2024, at 10:40, Les Ginsberg (ginsberg) <ginsb...@cisco.com> > wrote: > > > > Acee – > > > > Having the restarting router suppress advertisement of its adjacencies > does not address the transient state where routers in the network have > received the updated LSA from the neighbor with the reestablished adjacency > to the restarting router but still have the stale LSA from the restarting > router that has the pre-restart adjacency advertisements. (point #1 I made > below). > > > > The neighbors of the restarting router will not advertise the adjacency > until the stale LSAs are purged or updated - this is the whole point of > https://datatracker.ietf.org/doc/draft-hegde-lsr-ospf-better-idbx/ > > > > > > Thanks, > > Acee > > > > > > > > > > > > So this is not a robust solution. > > > > Les > > > > *From:* Acee Lindem <acee.i...@gmail.com> > *Sent:* Friday, July 12, 2024 7:21 AM > *To:* Les Ginsberg (ginsberg) <ginsb...@cisco.com> > *Cc:* Liyan Gong <gongli...@chinamobile.com>; Aijun Wang < > wangai...@tsinghua.org.cn>; Peter Psenak (ppsenak) <ppse...@cisco.com>; > Yingzhen Qu <yingzhen.i...@gmail.com>; lsr <lsr@ietf.org>; lsr-chairs < > lsr-cha...@ietf.org>; tony Przygienda <tonysi...@gmail.com>; shraddha < > shrad...@juniper.net> > *Subject:* Re: [Lsr] About Premature aging of LSA and Purge LSA > > > > Hi Les, > > > > > On Jul 12, 2024, at 02:57, Les Ginsberg (ginsberg) <ginsb...@cisco.com> > wrote: > > > > I am happy that work on this problem has begun. > > I believe the most robust way forward is to implement the mechanisms > defined in BOTH drafts. > > > > I think the mechanism defined in draft-hegde-lsr-ospf-better-idbx is sound > and not overly complex (sorry Liyan 😊) and should be done. > > But it does not solve all aspects of the problem. > > It does make LSDB synchronization more robust – which addresses the > control plane aspects of the problem. > > It also has the advantage that it does not require any support on the > neighboring routers – and so the benefits can be realized simply by > upgrading one router at a time. > > > > However, draft-hegde-lsr-ospf-better-idbx does not address forwarding > plane aspects of the problem – which become more significant at scale. > > There are two aspects of this problem: > > > > 1)You do not have control over the order in which the updated LSAs are > flooded to the rest of the network – so it is still possible for transient > forwarding issues to occur multiple hops away from the restarting router. > > 2)The restarting router requires additional time – after full LSDB sync – > to program the forwarding plane. It is well known that update of the > forwarding plane takes much longer than protocol SPF calculation. > > If only a few hundred routes are supported, this may not be of significant > concern, but if thousands of routes are supported the time it takes to > program the forwarding plane becomes a significant contributor. > > > > I fail to see how suppressing neighbor adjacency advertisement solves any > additional problems that are not solved by avoiding usage of the restarting > router’s stale LSAs. > > > > Note that the OSPF SPF has a check for bi-directional connectivity, > excerpted from section 16.1 of RFC2328: > > > > > > (b) Otherwise, W is a transit vertex (router or transit > > network). Look up the vertex W's LSA (router-LSA or > > network-LSA) in Area A's link state database. If the > > LSA does not exist, or its LS age is equal to MaxAge, or > > i*t does not have a link back to vertex V,* examine the > > next link in V's LSA.[23] > > > > > > > > Consequently, the restarting router can simply suppress its own link > advertisement until such time that is required to solve the above problems. > You should be familiar with this quote: > > > > > > “If you want a thing done well, do it yourself.” > > ― Napoleon Bonaparte > > > > > > Thanks, > > Acee > > > > > > > > > > > > > > > draft-cheng-lsr-ospf-adjacency-suppress provides a way to address the > above two aspects by providing a means for the neighbors of the restarting > router to delay advertisement of the restored adjacency to the restarting > router. (SA signaling) > > > > It could be argued that using SA signaling eliminates the need to do > anything else – but given that this mechanism depends upon support by all > the neighbors of the restarting router I believe there is still good reason > to implement both mechanisms. > > > > NOTE: I would prefer that the two drafts be combined into a single draft – > but that is optional and up to the authors. But from the WG perspective I > would like to see both solutions progress. > > > > Les > > > > > > > > *From:* Liyan Gong <gongli...@chinamobile.com> > *Sent:* Thursday, July 11, 2024 8:22 PM > *To:* Acee Lindem <acee.i...@gmail.com>; Aijun Wang < > wangai...@tsinghua.org.cn> > *Cc:* Peter Psenak (ppsenak) <ppse...@cisco.com>; Yingzhen Qu < > yingzhen.i...@gmail.com>; lsr <lsr@ietf.org>; lsr-chairs < > lsr-cha...@ietf.org>; tony Przygienda <tonysi...@gmail.com>; shraddha < > shrad...@juniper.net> > *Subject:* [Lsr] Re: About Premature aging of LSA and Purge LSA > > > > Hi Acee and Aijun, > > > > Thank you very much for your discussion. I would like to share my thoughts > on the proposed solutions. > > In my view, *draft-hegde-lsr-ospf-better-idbx *may not be as > straight forward as it initially appears. > > Despite its local applicability, it entails a complex neighbor > establishment process, which is fundamental to the OSPF protocol and > typically not altered lightly by those familiar with its workings. > > On the other hand, draft-cheng-lsr-ospf-adjacency-suppress presents a > more focused approach tailored to address the specific issue without > unintended consequences. > > I still believe the key factor in evaluating any approach is whether it > impacts the current systems negatively. > > > > Regarding our extensive discussions on these drafts, please refer to our > previous records for more details. > > > https://mailarchive.ietf.org/arch/search/?q=%22draft-cheng-lsr-ospf-adjacency-suppress%22 > > > > Thank you for your attention to this matter. > > > > Best Regards, > > Liyan > > > > ----邮件原文---- > *发件人:*Acee Lindem <acee.i...@gmail.com> > *收件人:*Aijun Wang <wangai...@tsinghua.org.cn> > *抄 送**:*Peter Psenak <ppse...@cisco.com>,Yingzhen Qu < > yingzhen.i...@gmail.com>,lsr <lsr@ietf.org>,lsr-chairs < > lsr-cha...@ietf.org>,tony Przygienda <tonysi...@gmail.com>,shraddha < > shrad...@juniper.net> > *发送时间:*2024-07-11 23:26:57 > *主**题:*[Lsr] Re: About Premature aging of LSA and Purge LSA > > As WG member: > > > > On Jul 11, 2024, at 05:29, Aijun Wang <wangai...@tsinghua.org.cn> wrote: > > > > And, there is also another draft aims to solve the similar problem > https://datatracker.ietf.org/doc/html/draft-cheng-lsr-ospf-adjacency-suppress-02, > which it declares similar with the solution in IS-IS. Why not take this > approach? > > > > Because this one doesn’t require any signaling and can accomplished via > local behavior without requiring support from any other OSPF router. > Additionally, it is simpler.. Well, at least for someone who has a deep > understanding of the protocol. > > > > Thanks, > > Acee > > > > > > > > Best Regards > > > > Aijun Wang > > China Telecom > > > > *发件人:* forwardingalgori...@ietf.org [mailto:forwardingalgori...@ietf.org > <forwardingalgori...@ietf.org>] *代表* Aijun Wang > *发送时间:* 2024年7月11日 17:20 > *收件人:* 'Acee Lindem' <acee.i...@gmail.com> > *抄送:* 'Peter Psenak' <ppse...@cisco.com>; 'Yingzhen Qu' < > yingzhen.i...@gmail.com>; 'lsr' <lsr@ietf.org>; 'lsr-chairs' < > lsr-cha...@ietf.org>; 'tony Przygienda' <tonysi...@gmail.com>; 'shraddha' > <shrad...@juniper.net> > *主题:* [Lsr] 答复: Re: About Premature aging of LSA and Purge LSA > > > > For the neighbors of the restarting router, why can’t they delete directly > the LSAs that originated by the restarting router instead of putting them > into one “Stale DB Exchange list” when they detect their neighbor is down? > > > > *发件人:* forwardingalgori...@ietf.org [mailto:forwardingalgori...@ietf.org > <forwardingalgori...@ietf.org>] *代表* Acee Lindem > *发送时间:* 2024年7月10日 22:14 > *收件人:* Aijun Wang <wangai...@tsinghua.org.cn> > *抄送:* Peter Psenak <ppse...@cisco.com>; Yingzhen Qu < > yingzhen.i...@gmail.com>; lsr <lsr@ietf.org>; lsr-chairs < > lsr-cha...@ietf.org>; tony Przygienda <tonysi...@gmail.com>; shraddha < > shrad...@juniper.net> > *主题:* [Lsr] Re: About Premature aging of LSA and Purge LSA > > > > Yes - but the whole discussion of adjacency suppression and database > synchronization is based on preventing TEMPORARY usage of stale LSAs > leading to false bidirectional adjacencies during unplanned restart. RFC > 2328 OSPF will converge without any modifications - there can just be > transient traffic drops and/or loops. > > > > Thanks, > > Acee > > On Jul 9, 2024, at 20:42, Aijun Wang <wangai...@tsinghua.org.cn> wrote: > > > > For the unplanned restart, shouldn’t the responsibility of the directed > connect neighbors to send out such LSAs for the purge of obsolete LSA? > > > > Best Regards > > > > Aijun Wang > > China Telecom > > > > *发件人:* forwardingalgori...@ietf.org [mailto:forwardingalgori...@ietf.org > <forwardingalgori...@ietf.org>] *代表* Acee Lindem > *发送时间:* 2024年7月9日 20:14 > *收件人:* Peter Psenak <ppse...@cisco.com> > *抄送:* Aijun Wang <wangai...@tsinghua.org.cn>; Yingzhen Qu < > yingzhen.i...@gmail.com>; lsr <lsr@ietf.org>; lsr-chairs < > lsr-cha...@ietf.org>; tony Przygienda <tonysi...@gmail.com>; shraddha < > shrad...@juniper.net> > *主题:* [Lsr] Re: About Premature aging of LSA and Purge LSA > > > > Additionally, you certainly don’t need a standards track solution to this > problem. An implementation could honor MinLSInterval by simply locally > keeping its own list of self-originated MaxAge LSAs and delaying > reorigination. > > > > Thanks, > > Acee > > > > On Jul 9, 2024, at 04:13, Peter Psenak <ppse...@cisco.com> wrote: > > > > Aijun, > > > > On 09/07/2024 09:46, Aijun Wang wrote: > > Hi, Acee: > > > > Can the proposal in > https://datatracker.ietf.org/doc/html/draft-dong-ospf-purge-lsa-00, > together with > https://datatracker.ietf.org/doc/html/rfc2328#section-14.1(Premature > aging of LSAs) solve your mentioned problem? > > If so, is it simpler than your proposal? > > That is, before the router restart, it needs only send out the Purge > LSA(when LSA sequence number is not to wrap) or premature aging of its > LSA.(when sequence number is to wrap) > > does not work for unplanned restart. > > thanks, > Peter > > > > Best Regards > > > > Aijun Wang > > China Telecom > > > > *发件人:* forwardingalgori...@ietf.org [mailto:forwardingalgori...@ietf.org > <forwardingalgori...@ietf.org>] *代表* Acee Lindem > *发送时间:* 2024年7月9日 3:58 > *收件人:* Yingzhen Qu <yingzhen.i...@gmail.com> <yingzhen.i...@gmail.com> > *抄送:* lsr <lsr@ietf.org> <lsr@ietf.org>; lsr-chairs <lsr-cha...@ietf.org> > <lsr-cha...@ietf.org>; tony Przygienda <tonysi...@gmail.com> > <tonysi...@gmail.com>; shraddha <shrad...@juniper.net> > <shrad...@juniper.net> > *主题:* [Lsr] Re: IETF 120 LSR Slot Requests > > > > Speaking as WG member: > > > > I would like a 10 minute slot to present an update to > https://datatracker.ietf.org/doc/draft-hegde-lsr-ospf-better-idbx/ > > > > Thanks, > > Acee > > > > > > On Jun 25, 2024, at 14:19, Yingzhen Qu <yingzhen.i...@gmail.com> wrote: > > > > Hi, > > > The draft agenda for IETF 120 has been posted: > > IETF 120 Meeting Agenda <https://datatracker.ietf.org/meeting/120/agenda/> > > > > The LSR session is scheduled on Friday Session I1 9:30 - 11:30, July 26, > 2024. > > > > Please send slot requests to lsr-cha...@ietf.org before the end of the > day Wednesday July 10th. Please include draft name and link, presenter, > desired slot length including Q&A. > > > > Please note that having a discussion on the LSR mailing list is a > prerequisite for a draft presentation in the WG session. If you need any > help please reach out to the chairs. > > > > Thanks, > > Yingzhen > > > > > _______________________________________________ > Lsr mailing list -- lsr@ietf.org > To unsubscribe send an email to lsr-le...@ietf.org > >
_______________________________________________ Lsr mailing list -- lsr@ietf.org To unsubscribe send an email to lsr-le...@ietf.org