Re: [Lsr] BGP vs PUA/PULSE

Tony Przygienda Thu, 02 Dec 2021 02:50:53 -0800

Idly thinking about the stuff more and more issues pop up that confirm my
initial gut feeling that the pulse stuff is simply not what IGP can do
reasonably (i.e. liveliness). negative as liveliness indication is arguably
even worse ;-) but I think most of us agreed on that across those hundreds
of emails by now.


So, to expound a bit. IGP reachability which IGP does normally is _very_
different from liveliness and here's another example (I describe it in
principle but people who deployed stuff will know what scenarios I'm
talking about)

So, in short, the fact that an IGP, let's say ABR, advertises a summary has
_nothing_ to do much with liveliness of what it summarizes in system wide
sense. In more specifics, even when this aggregate goes away or IGP cannot
compute _reachability_ to a specific address/node does NOT mean that the
prefix advertised by such node is not _alive_.

Imagine (often done in fact in deployments I dealt with) that the prefix
advertised by a node into IGP is not _reachable_ by IGP all of a sudden,
simplest case being a link loss of course. However, it is in the system
still reachable by means e.g. of a default route from another protocol or a
specific route (static?) over a link IGP is not running on. Now, if IGP
starts to pulse it will defeat the very purpose of such backup.

And no, you cannot "know" whether backup is here, there are even funky
cases where a policy only installs a backup route if the primary went away
which may be fast enough to keep e.g. TCP up (whether it's the best
possible architecture is disputable but it's a fact of live that such stuff
exists).

So, basically we try to invent "liveliness indication" in IGP whereas IGP
cannot be aware whether the prefix is reachable system-wide through it even
when IGP lost _reachability_.

And yes, before we go there, I know that with enough "limited domain" and
"limited scale" and "limited use case" arguments anything one can imagine
"works" ...

--- tony

On Wed, Dec 1, 2021 at 8:13 PM Les Ginsberg (ginsberg) <ginsb...@cisco.com>
wrote:

> Tony –
>
>
>
> Inline.
>
>
>
> *From:* Tony Przygienda <tonysi...@gmail.com>
> *Sent:* Wednesday, December 1, 2021 9:33 AM
> *To:* Les Ginsberg (ginsberg) <ginsb...@cisco.com>
> *Cc:* Peter Psenak (ppsenak) <ppse...@cisco.com>; Hannes Gredler <
> han...@gredler.at>; lsr <lsr@ietf.org>; Tony Li <tony...@tony.li>; Aijun
> Wang <wangai...@tsinghua.org.cn>; Robert Raszuk <rob...@raszuk.net>;
> Shraddha Hegde <shrad...@juniper.net>
> *Subject:* Re: [Lsr] BGP vs PUA/PULSE
>
>
>
> "
>
> Nodes which originate FSP-LSPs MUST
>
>    remember the last sequence number used for a given FSP-LSP and
>
>    increment the sequence number when generating a new version.
>
>
>
>    FSP-LSP generation SHOULD utilize the "next" FSP-LSP ID each time new
>
>    pulse information needs to be advertised i.e., if the most recent
>
>    FSP-LSP ID used was A-00.n, the next set of pulse information SHOULD
>
>    be advertised using FSP-LSP.ID A-00.n+1.  This minimizes the
>
>    possibility of confusion if other routers in the network have not yet
>
>    removed A-00.n from their LSPDB.
> "
>
> So you tell me I onver-interpreted as "between restarts" ;-) OK, fine. Fair 
> 'nuff. Maybe add one sentence clarification.
>
> *[LES:] Sure.*
>
> Otherwise yeah, I'd like the draft to add the "in case of partition things 
> may break but it's not much worse than before" ;-) and "assumption is that 
> the overlay will retry after dropping session on negative so no positives are 
> needed" and I'm ok with this thread.
>
> *[LES:] I think significantly more needs to be said about the current use 
> case for event notification – and this point can be part of that. Look for 
> that in the next revision of the draft.*
>
> my big gripe about "don't do it in main ISIS, take service instance" remains 
> though due to scalability concerns that bunch of senior folks here raised 
> already
>
> *[LES:] I am not in favor of a separate instance in this case. Reason being 
> all of the information required to determine when to send pulses is already 
> known by the main instance. Moving the pulse advertisements themselves to a 
> separate instance would likely be more costly in resources on the routers 
> themselves than advertising them in the main instance. Scale considerations 
> need to be addressed – as has been stated in this and earlier threads many 
> times – and that would be true regardless of whether we used the main 
> instance or a separate instance. *
>
> *There is also the point made by Greg Mirsky early on in this discussion – 
> that the use of event-notification needs to be carefully limited to cases 
> that make sense for the main routing instance. The next revision of the draft 
> will also address this point.*
>
> *    Les*
>
> -- tony
>
>
>
> On Wed, Dec 1, 2021 at 5:52 PM Les Ginsberg (ginsberg) <ginsb...@cisco.com>
> wrote:
>
> Tony –
>
>
>
>
>
> *From:* Tony Przygienda <tonysi...@gmail.com>
> *Sent:* Wednesday, December 1, 2021 7:58 AM
> *To:* Peter Psenak (ppsenak) <ppse...@cisco.com>
> *Cc:* Les Ginsberg (ginsberg) <ginsb...@cisco.com>; Hannes Gredler <
> han...@gredler.at>; lsr <lsr@ietf.org>; Tony Li <tony...@tony.li>; Aijun
> Wang <wangai...@tsinghua.org.cn>; Robert Raszuk <rob...@raszuk.net>;
> Shraddha Hegde <shrad...@juniper.net>
> *Subject:* Re: [Lsr] BGP vs PUA/PULSE
>
>
>
> 1. my question is different. why does the draft say that seqnr# & IDs have
> to be preserved between restarts
>
>
>
>
>
> *[LES:] Section 4.3.1 of the draft tries to answer your question – but
> there is no mention of “restart” there.*
>
> *There is in fact no mention of restart anywhere in the draft other than
> to say pulses are not preserved across restarts.*
>
>
>
> *WE only retain the sequence #’s to make it easier to identify a new Pulse
> LSP from a retransmission.*
>
>
>
>
>
> 2. I'm still concerned about L1/L2 hierarchy. If an L2 border sees same
> prefix negative pulses from two different L1/L2s  it still has to keep
> state to only pulse into L1 after _all_ the guys pulsed negative (which is
> basically impossible since the _negative_ cannot persist it seems). Now how
> will it even know that? it has to keep track who advertised the same
> summary & who pulsed or otherwise it will pulse on anyone with a summary
> giving a pulse and with that anycast won't work AFAIS and worse you get
> into weird situations where you have 2 L1/L2 into same L1 area, one lost
> link to reach the PE (arguably L1 got partitioned) and pulses & then the
> L1/L2 on the border of the down L1 pulses and tears the session down albeit
> the prefix is perfectly reachable through the other L1/L2. I assume that
> parses for the connoscenti ...
>
>
>
> *[LES:] We are not trying to handle the area partition case.*
>
> *In such a case, even if nothing is done, traffic will flow via both ABRs
> and half of it will be dropped – so one could argue that switching BGP
> traffic to the backup path is still a good idea.*
>
>
>
> *   Les*
>
>
>
> -=--- tony
>
>
>
> On Wed, Dec 1, 2021 at 4:00 PM Peter Psenak <ppse...@cisco.com> wrote:
>
> Tony,
>
> On 01/12/2021 15:31, Tony Przygienda wrote:
>
> >
> > Or maybe I missed something in the draft or between the lines in the
> > whole thing ... Do we assume the negative just quickly tears down the
> > BGP session & then it loses any relevance and we rely on BGP to retry
> > after reset automatically or something?
>
> yes.
>
>
> But then why do we even care about retaining the LSP IDs & SeqNr# would
> I ask?
>
> it's used for the purpose of flooding, so that during the flooding you
> do not flood the same pulse LSP multiple times.
>
> thanks,
> Peter
>
>
> >
> > -- tony
> >
> >
> >
> >
> >
> > On Tue, Nov 30, 2021 at 11:19 PM Les Ginsberg (ginsberg)
> > <ginsberg=40cisco....@dmarc.ietf.org
> > <mailto:40cisco....@dmarc.ietf.org>> wrote:
> >
> >     Hannes -
> >
> >     Please see
> >
> https://datatracker.ietf.org/doc/html/draft-ppsenak-lsr-igp-event-notification-00#section-4.1
> >
> >     The new Pulse LSPs don't have remaining lifetime - quite
> intentionally.
> >     They are only retained long enough to support flooding.
> >
> >     But, you remind me that we need to specify how the checksum is
> >     calculated. Will do that in the next revision.
> >
> >     Thanx.
> >
> >          Les
> >
> >      > -----Original Message-----
> >      > From: Hannes Gredler <han...@gredler.at <mailto:han...@gredler.at
> >>
> >      > Sent: Tuesday, November 30, 2021 11:22 AM
> >      > To: Peter Psenak (ppsenak) <ppse...@cisco.com
> >     <mailto:ppse...@cisco.com>>
> >      > Cc: Robert Raszuk <rob...@raszuk.net <mailto:rob...@raszuk.net>>;
> >     Les Ginsberg (ginsberg)
> >      > <ginsb...@cisco.com <mailto:ginsb...@cisco.com>>; Aijun Wang
> >     <wangai...@tsinghua.org.cn <mailto:wangai...@tsinghua.org.cn>>; lsr
> >      > <lsr@ietf.org <mailto:lsr@ietf.org>>; Tony Li <tony...@tony.li
> >     <mailto:tony...@tony.li>>; Shraddha Hegde
> >      > <shrad...@juniper.net <mailto:shrad...@juniper.net>>
> >      > Subject: Re: [Lsr] BGP vs PUA/PULSE
> >      >
> >      > hi peter,
> >      >
> >      > Just curious: Do you have an idea how to make short-lived LSPs
> >     compatible
> >      > with the problem stated in
> >      > https://datatracker.ietf.org/doc/html/rfc7987
> >      >
> >      > Would like to hear your thoughts on that.
> >      >
> >      > thanks,
> >      >
> >      > /hannes
> >      >
> >      > On Tue, Nov 30, 2021 at 01:15:04PM +0100, Peter Psenak wrote:
> >      > | Hi Robert,
> >      > |
> >      > | On 30/11/2021 12:40, Robert Raszuk wrote:
> >      > | > Hey Peter,
> >      > | >
> >      > | >      > #1 - I am not ok with the ephemeral nature of the
> >     advertisements. (I
> >      > | >      > proposed an alternative).
> >      > | >
> >      > | >     LSPs have their age today. One can generate LSP with the
> >     lifetime of 1
> >      > | >     min. Protocol already allows that.
> >      > | >
> >      > | >
> >      > | > That's a pretty clever comparison indeed. I had a feeling it
> >     will come
> >      > | > up here and here you go :)
> >      > | >
> >      > | > But I am afraid this is not comparing apple to apples.
> >      > | >
> >      > | > In LSPs or LSA flooding you have a bunch of mechanisms to
> >     make sure the
> >      > | > information stays fresh
> >      > | > and does not time out. And the default refresh in ISIS if I
> >     recall was
> >      > | > something like 15 minutes ?
> >      > |
> >      > | yes, default refresh is 900 for the default lifetime of 1200
> >     sec. Most
> >      > | people change both to much larger values.
> >      > |
> >      > | If I send the LSP with the lifetime of 1 min, there will never
> >     be any
> >      > | refresh of it. It will last 1 min and then will be purged and
> >     removed from
> >      > | the database. The only difference with the Pulse LSP is that it
> >     is not
> >      > | purged to avoid additional flooding.
> >      > |
> >      > |
> >      > | >
> >      > | >     Today in all MPLS networks host routes from all areas are
> >     "spread"
> >      > | >     everywhere including all P and PE routers, that's how LS
> >     protocols
> >      > | >     distribute data, we have no other way to do that in LS
> IGPs.
> >      > | >
> >      > | >
> >      > | > Can't you run OSPF over GRE ? For ISIS Henk had proposal not
> >     so long ago
> >      > | > to run it over TCP too.
> >      > | >
> >
> https://datatracker.ietf.org/doc/html/draft-hsmit-lsr-isis-flooding-over-
> >      > tcp-00
> >      > |
> >      > | you can run anything over GRE, including IGPs, and you don't
> >     need TCP
> >      > | transport for that. I don't see the relevance here. Are you
> >     suggesting to
> >      > | create GRE tunnels to all PEs that need the pulses? Nah, that
> >     would be an
> >      > | ugly requirement.
> >      > |
> >      > | thanks,
> >      > | Peter
> >      > |
> >      > |
> >      > | >
> >      > | > Seems like a perfect fit !
> >      > | >
> >      > | > Thx,
> >      > | > R.
> >      > |
> >
> >     _______________________________________________
> >     Lsr mailing list
> >     Lsr@ietf.org <mailto:Lsr@ietf.org>
> >     https://www.ietf.org/mailman/listinfo/lsr
> >
>
>

_______________________________________________
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr

Re: [Lsr] BGP vs PUA/PULSE

Reply via email to