Re: [Lsr] Comments on draft-chen-lsr-isis-big-tlv-00

Christian Hopps Wed, 29 Mar 2023 04:04:44 -0700


"Les Ginsberg (ginsberg)" <ginsb...@cisco.com> writes:

Chris -

-----Original Message-----
From: Christian Hopps <cho...@chopps.org>
Sent: Tuesday, March 28, 2023 11:40 PM
To: Les Ginsberg (ginsberg) <ginsb...@cisco.com>
Cc: Christian Hopps <cho...@chopps.org>; Huaimo Chen
<huaimo.c...@futurewei.com>; draft-chen-lsr-isis-big-tlv.auth...@ietf.org;
lsr@ietf.org
Subject: Re: [Lsr] Comments on draft-chen-lsr-isis-big-tlv-00

Supporting incremental upgrading routers in a network, with the
understanding that only the upgraded routers will take advantage of a new
feature -- is normal. In fact, it's what we normally strive for; flag-days are
bad.


[LES:] Incremental deployment is possible - even expected - when a new 
advertisement is defined in support of some new feature.
Nodes which don’t support the new feature aren’t making use of the new 
advertisement, so it does not impact them.
That is not what we are dealing with when advertising more than 255 bytes of
information. The need to do so does not arise because of the introduction of new
sub-TLVs. It arises because the use of existing sub-TLVs in quantity exceeds 255
bytes.
This means that all nodes in the network - whether they support the encoding of
more than 255 bytes or not - may need to parse any or all of the advertised
information. But when some of that information is advertised in a TLV that some
nodes may not parse correctly (either because they don’t support MP or they
don’t support Big TLV) then even "legacy nodes" are impacted.

[LES:] Please see my reply to Bruno for further detail. And please look at and 
respond to the example I previously provided.


In any case, there's a pivot here from "this doesn't technically work" to "it
technically works, but no one would want it" which is now a non-technical
assertion that I disagree with.

[LES:] I have no idea what triggers you to make this statement. At best it is a 
"cheap shot" at what I have stated which can be summarized as:

a)Big TLV does not provide what the draft claims it does
b)Having two ways to do the same thing is undesirable


I'm basing the technical position on what had been discussed on the thread 
including required changes to make incremental deployment work, i.e.,

1) when container/big tlv is used it MUST be exclusive i.e., the contained TLVs 
are removed from the top level tlv space. So only container TLVs now exist for 
the contained TLV data.

2) nodes supporting this new container/big tlv definition set a capability bit 
indicating this support.

So a legacy node will not have access to contained TLVs at all, and an upgraded 
node will disregard the legacy nodes (those w/o the capability bit set) when 
container nodes are present in the lsdb. So there's no mysterious failures in 
this network as all nodes are acting on the same logical lsdb.

Your example only upgraded a single node, so of course there's no-one for it to 
talk to.

A more realistic example of incremental deployment has the operator upgrading A 
*and* B and only expecting A and B to work. C will not be used until it's 
software is upgraded. As a bonus, upgraded routers (A) could log the fact that 
legacy nodes like C were left out b/c their capability bit was not set.

The current plan is that the operator has to figure out which vendor supports 
the only now specified multi-TLV behavior and the correct software version from 
that vendor (if it exists), and then upgrade (or disable) all routers that 
might use the new larger TLVs. There is no feedback that they have routers that 
do not support multi-TLVs and are now incorrectly interpreting these 
multi-TLVs, things just fail in various unpredictable ways.

IMO this is a rather sub-optimal solution, basically we do nothing except 
document what everyone should do, and let the operators figure the rest out.

Thanks,
Chris.
[as wg-member]

Les

Thanks,
Chris.

"Les Ginsberg (ginsberg)" <ginsb...@cisco.com> writes:

> Chris -
>
> <snip>
> However, that is the missing piece, so it works if we also add a capability 
bit.
> If we have the capability bit you now know which routers are processing
the
> container TLV and which aren't. That should be enough info to route
correctly.
>
> Using a container TLV *and* a capability bit is not a free lunch, but it 
should
work to allow incremental deployment safely. If that's something we want as
a WG.
> <end snip>
>
> No - this does not work.
> Customer deploys some features. They expect all routers in the network to
be able to correctly calculate topology and correctly forward for the features
they support.
> They do not deploy a feature and expect only a subset of the routers in the
network that are configured to support the feature to correctly calculate
paths.
>
> There is no way to successfully support incremental deployment.
>
> I already gave an example in my comments below:
>
>> >> > [LES:] Consider the following simple example.
>> >> >
>> >> > Node A needs to send 10 sub-TLVs about a particular object –
>> >> > requiring more than 255 bytes to be sent.
>> >> >
>> >> > Some nodes in the network do not support reception of more than
255
>> >> > bytes/object. Consider two such nodes.
>> >> >
>> >> > Node B, based on the local configuration, needs to be able to receive
>> >> > sub-TLVs 1,3,5,7,9 from A.
>> >> >
>> >> > Node C, based on local configuration, needs to be able to receive
>> >> > sub-TLVs 2,4,6,8,10 from A.
>> >> >
>> >> >
>> >> >
>> >> > There is no way that A can advertise all 10 sub-TLVs in a way which
>> >> > allows both B and C to correctly process the sub-TLVs they require.
>> >> >
>
>    Les
>
>> -----Original Message-----
>> From: Christian Hopps <cho...@chopps.org>
>> Sent: Tuesday, March 28, 2023 9:52 AM
>> To: Les Ginsberg (ginsberg) <ginsb...@cisco.com>
>> Cc: Christian Hopps <cho...@chopps.org>; Huaimo Chen
>> <huaimo.c...@futurewei.com>; draft-chen-lsr-isis-big-
tlv.auth...@ietf.org;
>> lsr@ietf.org
>> Subject: Re: [Lsr] Comments on draft-chen-lsr-isis-big-tlv-00
>>
>>
>> "Les Ginsberg (ginsberg)" <ginsb...@cisco.com> writes:
>>
>> > Chris -
>> >
>> > Please see inline - I'll try to conform to your request about ">>>"
quoting -
>> > but given that this style does not identify who made the comment, I
have
>> found
>> > in the past that this style becomes very hard to follow after a couple of
>> > replies.
>> > Though perhaps that could be said of any style. 😊
>>
>> Well in the ">>>" style my text that you were quoting would have been
>>
>> "> like this"
>>
>> and yours would not have anything preceding it.. like mine is here.
>>
>> anyway, it's a losing battle against html I typically just load these email 
into
>> chrome when I need to read them..
>>
>> >> -----Original Message-----
>> >> From: Christian Hopps <cho...@chopps.org>
>> >> Sent: Tuesday, March 28, 2023 7:27 AM
>> >> To: Les Ginsberg (ginsberg) <ginsb...@cisco.com>
>> >> Cc: Huaimo Chen <huaimo.c...@futurewei.com>; draft-chen-lsr-isis-
big-
>> >> tlv.auth...@ietf.org; lsr@ietf.org
>> >> Subject: Re: [Lsr] Comments on draft-chen-lsr-isis-big-tlv-00
>> >>
>> >>
>> >> Hi,
>> >>
>> >> So I agree that using this new container TLV along with old TLVs doesn't
>> help.
>> >>
>> >
>> >>>[LES:] Good - we agree.
>> >
>> >> However, it is worth nothing that if *only* the container TLV was used
>> (i.e.,
>> >> once a TLV became too large it would be removed and placed inside
>> >> container TLVs) then it would actually represent a safer way to deploy
this
>> >> "multiple tlv" functionality.
>> >>
>> >> If the container only was used, then only routers that understood
would
>> be
>> >> able to use *any* of the TLV data. This would actually solve the
problem
>> of
>> >> "newly inserted legacy router brings everything back down" that using
a
>> >> required capability bit being set on all routers has.
>> >>
>> >>>[LES:] I don't agree - and here is why. Let's use the example of
Neighbor
>> TLVs.
>> >>>With what you propose, when a router starts using the container TLV,
>> those routers who don’t support/understand it would simply not be
aware
>> of the advertisement at all.
>> >>>This would result in inconsistent routing calculations on different
routers
>> leading to loops/blackholes.
>> >>>Hardly a benign impact.
>>
>> You're right, not sure why I thought new routers would know that old
routers
>> weren't acting on the container TLV.
>>
>> However, that is the missing piece, so it works if we also add a capability
bit.
>> If we have the capability bit you now know which routers are processing
the
>> container TLV and which aren't. That should be enough info to route
>> correctly.
>>
>> >>>There is no free lunch here. No matter what encoding scheme you
come
>> up with, unless all routers in the network understand it, things are going
to
>> break.
>> >
>>
>> Using a container TLV *and* a capability bit is not a free lunch, but it
should
>> work to allow incremental deployment safely. If that's something we
want as
>> a WG.
>>
>> Thanks,
>> Chris.
>> [as wg-member]
>>
>> >> This later issue with the capability bit is why no-one wanted to use a it,
>> and
>> >> why we currently have this very sub-optimal "solution" of "just do it
and
>> >> hope it works".
>> >
>> >>>[LES:] Folks (like me) who implemented MP for TLVs like
Neighbor/Prefix
>> were
>> >>> following established practice for the protocol i.e., there are multiple
>> >>> cases where this behavior is explicitly specified (please see MP draft
for
>> a
>> >>> list)
>> >>>So it made sense to use the same mechanism for other TLVs.
>> >>>We are not naïve - we understood very well that if not all routers in
the
>> network supported at least reception of MP TLVs that there would be
>> deployment issues.
>> >>>That is why I am working with enthusiasm on the MP draft.
>> >
>> >    Les
>> >
>> >>
>> >> Thanks,
>> >> Chris.
>> >> [as wg-member]
>> >>
>> >>
>> >> P.S. the quoting style used in this thread is fabulously hard to
>> comprehend in
>> >> a text based email client.. What's wrong with good old ">>>" quoting
style
>> >> anyway?
>> >>
>> >>
>> >> "Les Ginsberg (ginsberg)" <ginsberg=40cisco....@dmarc.ietf.org>
>> writes:
>> >>
>> >> > Huaimo –
>> >> >
>> >> >
>> >> >
>> >> > Please see inline.
>> >> >
>> >> >
>> >> >
>> >> > From: Huaimo Chen <huaimo.c...@futurewei.com>
>> >> > Sent: Sunday, March 26, 2023 3:41 AM
>> >> > To: Les Ginsberg (ginsberg) <ginsb...@cisco.com>; lsr@ietf.org;
>> >> > draft-chen-lsr-isis-big-tlv.auth...@ietf.org
>> >> > Subject: Re: Comments on draft-chen-lsr-isis-big-tlv-00
>> >> >
>> >> >
>> >> >
>> >> > Hi Les,
>> >> >
>> >> >
>> >> >
>> >> >     Thanks much for your comments.
>> >> >
>> >> >     My responses are inline below with HC.
>> >> >
>> >> >
>> >> >
>> >> > Best Regards,
>> >> >
>> >> > Huaimo
>> >> >
>> >> >
>> >> >
>> >> > From: Les Ginsberg (ginsberg) <ginsb...@cisco.com>
>> >> > Sent: Thursday, March 23, 2023 3:35 AM
>> >> > To: lsr@ietf.org <lsr@ietf.org>;
>> >> > draft-chen-lsr-isis-big-tlv.auth...@ietf.org <
>> >> > draft-chen-lsr-isis-big-tlv.auth...@ietf.org>
>> >> > Subject: Comments on draft-chen-lsr-isis-big-tlv-00
>> >> >
>> >> >
>> >> >
>> >> > Folks -
>> >> >
>> >> >
>> >> >
>> >> > This draft proposes a new way to handle advertisement of more
than
>> >> > 255 bytes of information about a given object.
>> >> >
>> >> > It defines a new "container TLV" to carry additional information
>> >> > about an object beyond the (up to) 255 bytes of information
>> >> > advertised in an existing TLV.
>> >> >
>> >> >
>> >> >
>> >> > The draft is defining a solution to a problem which has already been
>> >> > addressed without requiring any protocol extensions.
>> >> >
>> >> > [HC]: It seems that a protocol includes a set of procedures. Would
>> >> > you mind telling me which existing protocols can be used to resolve
>> >> > the problem without requiring any protocol extensions?
>> >> >
>> >> >
>> >> >
>> >> > [LES:] Please read draft-pkaneria-lsr-multi-tlv-02 carefully.
>> >> >
>> >> > Section 1 documents that there are existing RFCs which explicitly
>> >> > state that multiple TLVs for the same object are allowed to be sent.
>> >> >
>> >> > What the draft goes on to discuss is the use of the same mechanism
>> >> > (sending multiple TLVs for the same object) in cases where existing
>> >> > RFCs have not explicitly stated this behavior.
>> >> >
>> >> >
>> >> >
>> >> > It is also a fact that there are multiple implementations from
>> >> > multiple vendors already shipping that utilize this mechanism for
>> >> > TLVs such as Neighbor and Prefix reachability.
>> >> >
>> >> >
>> >> >
>> >> > The existing solution - discussed in https://datatracker.ietf.org/doc
>> >> > /draft-pkaneria-lsr-multi-tlv/ has already been successfully
>> >> > implemented and deployed by multiple vendors.
>> >> >
>> >> > [HC]: You are a co-author of this draft, called a first draft for
>> >> > resolving the problem on big TLVs. This first draft contains some
>> >> > protocol extensions. If there is a solution for the problem without
>> >> > requiring any protocol extensions, then why do you as a co-author
>> >> > work on the first draft with protocol extensions?
>> >> >
>> >> >
>> >> >
>> >> > [LES:] There are no protocol extensions defined in
>> >> > draft-pkaneria-lsr-multi-tlv-02 (please see the statement in the IANA
>> >> > section). The draft has been written to clarify existing behavior and
>> >> > to discuss best deployment practices in cases where not all
>> >> > implementations support reception of multiple TLVs for a given
>> >> > object.
>> >> >
>> >> >
>> >> >
>> >> > The definition of a second solution to the problem is not needed -
>> >> > and in fact further complicates both implementation and
deployment.
>> >> > Should the existing solution be used? Should the new solution be
>> >> > used? What is the state of support by all nodes in the network for
>> >> > each solution?
>> >> >
>> >> > [HC]:  It seems better to merge the two drafts (i.e., the first draft
>> >> > and the second draft defining container TLV) into one.
>> >> >
>> >> >
>> >> >
>> >> > [LES:] This would the worst possible outcome.
>> >> >
>> >> > It would define two mechanisms for sending more than 255 bytes of
>> >> > information about an object.
>> >> >
>> >> > This would require implementations to support two different
>> >> > mechanisms for advertising the same information – also requiring
the
>> >> > ability to control which mechanism should be used in a given
>> >> > deployment and even raising the possibility that both forms would
>> >> > need to be sent in parallel. This adds unnecessary complexity to
>> >> > implementations.
>> >> >
>> >> >
>> >> >
>> >> > For operators deploying features+scale which require such support,
>> >> > they would now have to identify not only whether all
implementations
>> >> > in their deployment support sending/receiving more than 255 bytes/
>> >> > object, but also which form of advertisement is supported – further
>> >> > complicating deployment considerations.
>> >> >
>> >> >
>> >> >
>> >> > And since there are explicit statements requiring the current form of
>> >> > advertisement to be used for some TLVs, behavior would potentially
>> >> > differ on a per TLV basis.
>> >> >
>> >> >
>> >> >
>> >> > The motivation for the new solution seems to be the notion that it
>> >> > supports partial deployment. Text in
https://www.ietf.org/archive/id/
>> >> > draft-chen-lsr-isis-big-tlv-00.html#name-incremental-deployment
>> >> >  states:
>> >> >
>> >> >
>> >> >
>> >> > "For a network using IS-IS, users can deploy the extension for big
>> >> > TLV in a part of the network step by step.
>> >> >
>> >> > The network has some nodes supporting the extension (or say new
>> nodes
>> >> > for short) and the other nodes not
>> >> >
>> >> > supporting the extension (or say old nodes for short) before the
>> >> > extension is deployed in the entire network."
>> >> >
>> >> >
>> >> >
>> >> > This suggests the authors believe that a network can function with
>> >> > some nodes using all of the advertisements and some nodes using
only
>> >> > the legacy advertisements, but this is obviously false.
>> >> >
>> >> > Fundamental to operation of a link state protocol is that all nodes
>> >> > in the network operate on identical LSPDBs.
>> >> >
>> >> > The suggestion that features will work correctly when some nodes
use
>> >> > attributes advertised in legacy TLVs and the new container TLV while
>> >> > some nodes use only the attributes advertised in legacy TLVs is
>> >> > simply incorrect.
>> >> >
>> >> > [HC]: Every node in the network has the same LSPDB. The new
nodes
>> >> > understand the new container TLVs and may use them. The old
nodes
>> do
>> >> > not understand them and do not use them.
>> >> >
>> >> >
>> >> >
>> >> > [LES:] Consider the following simple example.
>> >> >
>> >> > Node A needs to send 10 sub-TLVs about a particular object –
>> >> > requiring more than 255 bytes to be sent.
>> >> >
>> >> > Some nodes in the network do not support reception of more than
255
>> >> > bytes/object. Consider two such nodes.
>> >> >
>> >> > Node B, based on the local configuration, needs to be able to receive
>> >> > sub-TLVs 1,3,5,7,9 from A.
>> >> >
>> >> > Node C, based on local configuration, needs to be able to receive
>> >> > sub-TLVs 2,4,6,8,10 from A.
>> >> >
>> >> >
>> >> >
>> >> > There is no way that A can advertise all 10 sub-TLVs in a way which
>> >> > allows both B and C to correctly process the sub-TLVs they require.
>> >> >
>> >> >
>> >> >
>> >> > Network functionality is compromised.
>> >> >
>> >> >
>> >> >
>> >> > It is true that even with the existing solution unless all nodes are
>> >> > capable of processing more than 255 bytes of information/object
>> >> > network functionality will be compromised. That is exactly what
>> >> > motivated the writing of draft-pkaneria-lsr-multi-tlv.
>> >> >
>> >> > But your proposal does nothing to make that requirement easier to
>> >> > address. It in fact makes implementation/deployment even more
>> >> > difficult – as I have described above.
>> >> >
>> >> >
>> >> >
>> >> > It is also important to also state that the advertisement of more
>> >> > than 255 bytes of information is driven by configuration – not a
>> >> > protocol implementation choice. Suppressing advertisement of some
of
>> >> > the configured information also does not result in a working
network.
>> >> >
>> >> >
>> >> >
>> >> > In short, there is no positive value from the proposed extension –
>> >> > and it does harm by further complicating implementations and
>> >> > deployments.
>> >> >
>> >> > [HC]: The second draft defines a general mechanism for resolving the
>> >> > problem. It is backward compatible and simple.  It does not do any
>> >> > harm.
>> >> >
>> >> >
>> >> >
>> >> > [LES:] You are proposing a second solution for a problem that has
>> >> > already been solved. In doing so you are introducing new problems
and
>> >> > not solving any existing issues. Saying this “does no harm” is
>> >> > clearly false.
>> >> >
>> >> >
>> >> >
>> >> >    Les
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > The draft should be abandoned.
>> >> >
>> >> >
>> >> >
>> >> >     Les
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > Lsr mailing list
>> >> > Lsr@ietf.org
>> >> > https://www.ietf.org/mailman/listinfo/lsr


_______________________________________________
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr

Re: [Lsr] Comments on draft-chen-lsr-isis-big-tlv-00

Reply via email to