Re: [Lsr] Tsvart early review of draft-ietf-lsr-isis-fast-flooding-06

bruno . decraene Mon, 05 Feb 2024 10:19:34 -0800

[+Les, Guillaume as we go quite deep in the discussion]

Hi Mirja,


Thank you for your review and comments. Very useful.

Please see inline [Bruno]
 
> From: Mirja Kühlewind via Datatracker <nore...@ietf.org> 
> Sent: Friday, February 2, 2024 3:57 PM
> 
> Reviewer: Mirja Kühlewind
> Review result: Not Ready
> 
> First of all I have a clarification question: The use the of flags TLV with 
> the O flag is not clear to me. Is that also meant as a configuration 
> parameter or is that supposed to be a subTLV that has to be sent together 
> with the PSNP? If it is a configuration, doesn’t the receiver need to confirm 
> that the configuration is used and how does that work in the LAN scenario 
> where multiple configurations are used? If it has to be sent together with 
> the PSNP, this needs to be clarified and it seem a bit strange to me that it 
> is part of the same TLV. Or maybe I’m missing something completely about the 
> flag?

[Bruno] The O-flag is advertised by the receiver in the Flags sub-TLV, which 
may be sent either in PSNP or IIH.
That's not a configuration but a capability of the receiver which is signaled 
to the sender.
That's only applicable to the point-to-point scenario, not the LAN scenario. ( 
as on a LAN there is no explicit acknowledgment of the receipt of LSPs between 
a given LSP transmitter and a given LSP receiver).

> it seem a bit strange to me that it is part of the same TLV

[Bruno] 
All those sub-TLVs, at least the one currently defined, carries (relatively) 
static parameters and are not required to be sent in all IIH or PSNP. The way 
IS-IS acknowledges the reception of LSP is not changed
They are all grouped in a single TLV called " Flooding Parameters TLV" for 
grouping purpose and also because IS-IS has a limited TLV space.
If the above does not clarify, could you please elaborate on what you feel 
"strange" about?

 
> Then, generally thank you for considering overload and congestion carefully.
> Please see my many comments below, however, I think one important part is to 
> ensure that the network/link doesn’t get normally overloaded with the 
> parameter selected. You give some recommendation about parameters to use but 
> not for all and, more importantly, it would be good to define boundaries for 
> safe use.
> What’s a “safe” min or max value? I know this question is often not easy to 
> answer, however, if you as the expert don’t give the right recommendations, 
> how should a regular implementer make the right choice?

[Bruno]
Very fair points. And thank you for acknowledging that this question is not 
easy to answer...
TL;DR: sorry, I don't know.

Two general statements first:
- IS-IS is running in the control plane of two adjacent routers, typically in 
backbone of network operators. There is a single point to point link over fiber 
with typically latest interfaces speed (e.g. > 100G today). I would not assume 
that IS-IS would overload, or even significantly load this interface. From a 
jitter standpoint packet priority/CoS could be discussed but I would assume 
that I'm assuming that this is a different discussion.
- currently IS-IS has no flow control nor congestion control. Given this, 
current values are very conservative (e.g., one packet every 33ms). At the same 
time that's a very important signaling for the network: we would prefer not 
dropping LSP but on the other hand delaying LSP for seconds is not helping. For 
historically and good reasons IS-IS implementers are very conservative. As of 
today, I would not assume that they would be too aggressive.
- one problem of stating values in RFC is that those values may not age well. 
That's typically the case with IS-IS with some parameters values which are 
still 25 years old while CPU and networks have evolved significantly since 
then. So I'm a bit reluctant to write static values in stone again.

Coming back to min and max value:
- I'm not an implementor, but I do care about my networks. If I were an 
implementor, I would play safe and advertise values which are safe for my 
implementation as receiver. I would rather use those values as a protection 
from a too aggressive sender, than as a permission to overload (DOS) me. Both 
sender and receiver are within the same administrative domain (for certain). In 
case of issue, the network operator will debug both the sender and receiver and 
blame the one which did not behave.
- There are a wide range router size/price/capability/generation. Plus I would 
expect this value to gradually improve as the implementation improve thanks to 
the capabilities introduced by this document.
- I'm definitely not a transport/TCP expert. Does TCP define such min and max 
values?
 
We propose some guidance for the LPP (which is somewhat IS-IS specific) in 
https://datatracker.ietf.org/doc/html/draft-ietf-lsr-isis-fast-flooding-06#section-5.1
But for other parameters, I'm not sure that indicating values would be useful.

> Please see further comments below.
> 
> Section 4.7:
> “NOTE: The focus of work used to develop the example algorithms discussed 
> later in this document focused on operation over point-to-point interfaces. A 
> full discussion of how best to do faster flooding on a LAN interface is 
> therefore out of scope for this document.”
> 
> Actually this is quite important and also not clear to me. You do discuss how 
> to interpret parameters in a LAN scenario but then you say you only give 
> proper guidance how to adjust the sending rate for non-LAN. But what’s the 
> right thing to do in LAN then? Why is LAN out of scope? If you don’t give 
> guidance, I think you have to also say that this mechanism that enables using 
> higher values in this document MUST NOT be used on LAN setups.

[Bruno] In point-to-point there is one sender for one receiver. In LAN, there 
are N receivers for 1 sender, and possibly N senders for all/each receiver. The 
guidance is whether the multiplicative factor is to be handled by the sender or 
the receiver. Document says that the value is used by the sender as-is, so it's 
up to the receiver to take into account the number of speaker on the LAN. This 
guidance seems required for correct semantic.

Then the TLV may carries different sub-TLVs. Some may be applicable to LAN 
(e.g., Burst Size).
Some are less applicable to LAN because the way IS-IS acknowledge LSP is 
different in LAN and less dynamic. The LAN case is both less frequent those 
days (if not rare in backbones) and more difficult to handle as IS-IS 
acknowledges LSP in a slow and less explicit way hence we have a loose feedback 
loop to use. Eventually, someone could define new sub-TLV or procedure to 
improve the LAN case therefore I don't think that we should define TLV as not 
applicable to LAN.
https://datatracker.ietf.org/doc/html/draft-ietf-lsr-isis-fast-flooding-06#section-6.2.1.2
 does partially discuss and cover the LAN case. Possibly the operation would be 
further improved, but I think that it's too late to add specification for the 
LAN . This may be covered in a subsequent doc. (but really LAN is not a 
priority those days)


> 
> Section 5.1:
> “The receiver SHOULD reduce its partialSNPInterval. The choice of this lower 
> value is a local choice. It may depend on the available processing power of 
> the node, the number of adjacencies, and the requirement to synchronize the 
> LSDB more quickly. 200 ms seems to be a reasonable value.”
> 
> Giving some recommended value is fine, however, it would be more important to 
> ensure safe operation to give a range or at least a minimum value.

[Bruno] Maximum value is defined in the "old" IS-IS spec. Minimal value seems 
very implementation specific to me.
More importantly, I don't think that safety comes into play but text could be 
more explicit on this. The goal is for the receiver to provide some "frequent" 
feedback to the sender so that the sender can adapter faster and hence be 
"safer": "Faster LSP flooding benefits from a faster feedback loop. This 
requires a reduction in the delay in sending PSNPs."
Nothing breaks if the receiver is too slow. Quite the contrary, the flow 
control algorithm would slow down, hence be on the safe side.

In order to be more explicit, I would propose the addition of the following 
text:
"The value of the "Partial SNP Interval sub-TLV" MAY be used by the sender for 
flow control and congestion control. It MUST NOT be used to trigger LSP 
retransmission."
I'd rather add this in section 4.5 which defines the sub-TLV
https://datatracker.ietf.org/doc/html/draft-ietf-lsr-isis-fast-flooding-06#name-partial-snp-interval-sub-tl

> 
> Also on use of normative language. Just saying “The receiver SHOULD reduce 
> its partialSNPInterval.” Is a bit meaningless without saying when and to with 
> value/by how much. I guess you should say something like “partialSNPInterval 
> SHOULD be set to 200ms and MUST NOT be lower than X.”

[Bruno] Good point. Thank you. Proposed change:
OLD:  The receiver SHOULD reduce its partialSNPInterval.
NEW: For the generation of PSNPs, the receiver SHOULD use a partialSNPInterval 
smaller than the one defined in [ISO10589].

 
> “The LPP SHOULD also be less than or equal to 90 as this is the maximum 
> number of LSPs that can be acknowledged in a PSNP at common MTU sizes, hence 
> waiting longer would not reduce the number of PSNPs sent but would delay the 
> acknowledgements. Based on experimental evidence, 15 unacknowledged LSPs is a 
> good value assuming that the Receive Window is at least 30 and that both the 
> transmitter and receiver have reasonably fast CPUs.”
> 
> Why is the first SHOULD a SHOULD and not a MUST? What is a reasonable fast 
> CDU?

[Bruno] The first "SHOULD" is a SHOULD because nothing breaks if it's not 
applied. Also the goal is for the receiver to provide frequent/fast feedback to 
the sender. If the throughput were very fast (e.g., 1Gb/s in a distant future), 
sending an acknowledgement (PSNP) every 90 LSPs would provide a feedback every 
1 ms which seem relatively responsive, especially compared to the link RTT in 
WAN. Possibly this could be rephrased to better focus on the need.
OLD:   The LPP SHOULD also be less than or equal to 90 as this is the maximum 
number of LSPs that can be acknowledged in a PSNP at common MTU sizes, hence 
waiting longer would not reduce the number of PSNPs sent but would delay the 
acknowledgements.
NEW:  The LPP SHOULD be less than or equal to the maximum number of LSPs that 
can be acknowledged in a PSNP because waiting longer would not reduce the 
number of PSNPs sent but would delay the acknowledgements. This is 90 at common 
MTU sizes. 

> What is a reasonable fast CDU?
Touché! That's the issue with indicating fixed value which will be outdated in 
the future. 
Proposed change:
OLD:   Based on experimental evidence, 15 unacknowledged LSPs is a good value 
assuming that the Receive Window is at least 30 and that both the transmitter 
and receiver have reasonably fast CPUs.
NEW:  Based on experimental evidence, 15 unacknowledged LSPs is a good value 
assuming that the Receive Window is at least 30.

 
> Why would the receive window be 30? Is that also the value that you would 
> recommend? So you maybe more generally aim to recommend to set the LPP to 
> half the Receive Window (or does it have to be those specific values)?

LPP value is discussed in 
https://datatracker.ietf.org/doc/html/draft-ietf-lsr-isis-fast-flooding-06#section-6.2.2.5
I would propose to add 
NEW: The choice of the LPP value is discussed in 
https://datatracker.ietf.org/doc/html/draft-ietf-lsr-isis-fast-flooding-06#section-6.2.2.5

To answer your questions:
- In tests, we found the LPP 15 is a good trade-of between increase feedback 
rate to the sender and increased load of acknowledging on both the receiver and 
sender.
- As indicated in 
https://datatracker.ietf.org/doc/html/draft-ietf-lsr-isis-fast-flooding-06#section-6.2.2.5
 for performance reasons its better if LPP is an integer fraction of the 
Receive Window. Hence choosing LPP 15 assume that the receive window is 30
- we don't recommend the receive window to be 30. In general, the larger the 
better for links with high RTT. However, this is a new concept for IS-IS and 
indicating a high number may afraid IS-IS implementation.  30 LSPs is a 45ko 
receive window which seems relatively small for a control plane memory (typical 
between a laptop and a small server) and for a critical protocol.
- I agree with you that setting LPP to half the receive window works fine. But 
a third would probably be even better with a large receive window. It really 
depends on the receive window and the goal is to provide a fast feedback: if 
the receive window is very large, we don’t necessarily want to delay PSNP too 
much.
- it definitely does not have to be a specific value.
- essentially, the LPP adds a delay to the feedback loop, in addition to the 
link RTT. (it adds "LSP sending rate"/LPP). Depending on the Receive Window and 
link RTT, LPP may reduce the achievable rate. But in many cases it would not. 
May be indicating that "desired LSP sending rate"/LPP be significantly smaller 
than link RTT would help the reader, but on the other hand I feel that it's a 
bit late for a change like this, unless you would support it given your 
experience in the transport area.



> 
> Section 5.2:
> 
> “Therefore implementations SHOULD prioritize the receipt of Hellos and then 
> SNPs over LSPs. Implementations MAY also prioritize IS-IS packets over other 
> less critical protocols.”
>
> 
> What do mean by prioritise exactly? I find the second sentence here 
> meaningless when you only say “less critical protocols”. What do you mean by 
> this? How should I as an implementer decide which protocols are more or less 
> critical?

[Bruno] On routers, packets being transited are typically forwarded at line 
rate. But packets received from a set of very high-speed interfaces (e.g. 
aggregated received bandwidth of 10Tb/s) to the router's control plane (e.g. a 
laptop) face a significant bottleneck. Typically this bottleneck can give 
priority to some packets and may rate limit flows to protect from (D)DOS.
On one hand, relative priority between protocols is indeed a local choice of 
the implementer. Based on experience, he should know. I don't expect this to be 
novel, but as we increase the rate of IS-IS LSPs, the point is more important 
so we felt we should raise the point. To some extent, the question is the same 
for CPU allocation and scheduling. E.g., I would give more priority to IS-IS 
compared to BMP (monitoring) or even BGP even though BGP is an important 
routing protocol. Because essentially IS-IS is critical, the foundation of 
routing in the network and it's computation just assume that flooding is 
"perfect" so is sensitive to lack of global database consistency.
 
> Section 6.1:
> “Congestion control creates multiple interacting control loops between 
> multiple transmitters and multiple receivers to prevent the transmitters from 
> overwhelming the overall network.”
> 
> This is an editorial comment: I think I know what you mean but the sentence 
> is not clear as there is always only one congestion loop between one 
> transmitter and one receiver.

[Bruno] Yes you know much better than us. Hence a suggestion would be welcomed 
😉.
I don't feel that the sentence contradicts your point as:
- I agree that "there is always only one congestion loop between one 
transmitter and one receiver"
- The sentence is trying to say that here are multiple senders hence multiple 
control loops. And that they affecting each other as they may compete for the 
same common resource on the way

We are trying to explain the different between flow control and congestion 
control, why both are useful and possibly why congestion control is harder. 
Suggestion is welcomed.

> 
> Section 6.2.1:
> “If no value is advertised, the transmitter should initialize rwin with its 
> own local value.”
> 
> I think you need to give more guidance anyway but a good buffer size might be.
> However, if you don’t know the other ends capability, I’m not sure if you own 
> value is a good idea or if it would be better to be rather conservative and 
> select a low value that still provides reasonable performance.

[Bruno] Actually we meant
OLD:  If no value is advertised, the transmitter should initialize rwin with 
its own local value.
NEW: If no value is advertised, the transmitter should initialize rwin with its 
locally configured value for this neighbor.

As 
https://datatracker.ietf.org/doc/html/draft-ietf-lsr-isis-fast-flooding-06#section-4
 says that if a value is not advertised, a locally configured value should be 
used.

I agree with you that more guidance would be helpful. Thank you for the comment.
Yet not completely easy especially for IS-IS as this is a new concept and 
different implementations may be very different. 

We could add the following text at the end of §6.2.1
NEW: The RWIN value is of importance when the RTT is the limitation. In this 
case the optimal size is the desired LSP rate divided by the RTT. The RTT being 
the addition of the link RTT plus the time taken by the receiver to acknowledge 
the first received LSP in its PSNP. 50 or 100 may be reasonable default numbers.

Comments and suggestion welcomed.
FYI:
- typical hardware is the equivalent of a high end laptop although RAM tend to 
be less restricted those days (e.g. 128GB)
- I would rate IS-IS as critical so would have priority 
- 100 IS-IS neighbhor seems a reasonable medium range
- 1k to 10k LSP/s would already be great.
- link RTT is very variable and depends to link distance. Could be 10km in 
dense area, 100s km in typical WAN in Europe, 1000 km seems high given country 
sizes and optical capabilities. Max would be intercontinental links crossing 
oceans.

> Section 6.2.1.1:
> “The LSP transmitter MUST NOT exceed these parameters. After having sent a 
> full burst of un-acknowledged LSPs, it MUST send the following LSPs with an 
> LSP Transmission Interval between LSP transmissions. For CPU scheduling 
> reasons, this rate may be averaged over a small period, e.g., 10-30ms.”
> 
> I not sure I fully understand what you mean by “averaged over a small period”?
> What exactly?

[Bruno] 
Rate is averaged over a small period (T) : #LSP during T / T. In which case 
during this period, burst is allowed.
e.g. with 100 LSP / 10 ms, one may send 10 LSPs every 1 ms rather than a strict 
1 LSP every 0.1ms.
In any case, the burst size is not to be exceeded.

If this is not clear, any suggestion welcomed.
 
> Section 6.2.1.2:
> “f no PSNPs have been generated on the LAN for a suitable period of time, 
> then an LSP transmitter can safely set the number of un-acknowledged LSPs to 
> zero.
> Since this suitable period of time is much higher than the fast 
> acknowledgment of LSPs defined in Section 5.1, the sustainable transmission 
> rate of LSPs will be much slower on a LAN interface than on a point-to-point 
> interface.”
> 
> What a suitable period of time? Can you be more concrete?

[Bruno] Good question. But difficult question.
Les, would you have some suggestion?
Otherwise, rather than adding more text for the LAN case, I'd rather remove 
some text with the following change

OLD:
However, an LSP transmitter on a LAN can infer whether any LSP receiver on the 
LAN has requested retransmission of LSPs from the DIS by monitoring PSNPs 
generated on the LAN. If no PSNPs have been generated on the LAN for a suitable 
period of time, then an LSP transmitter can safely set the number of 
un-acknowledged LSPs to zero. Since this suitable period of time is much higher 
than the fast acknowledgment of LSPs defined in Section 5.1, the sustainable 
transmission rate of LSPs will be much slower on a LAN interface than on a 
point-to-point interface.

NEW: /nothing/

As already stated, probably one could do better in the LAN case. E.g., 
advertising the delay between periodic CSNP (which would answer your question), 
sending in (LSP-ID) order, having the receiver send PSNP on a range of LSP-ID 
after a specific delay/LPP. But again, LAN is not seen as the priority in this 
document.
 
> Section 6.2.2.1
> 
> - As a side note, I don’t think figure 1 is useful at all…

[Bruno] OK. Note that Figure 2 is somewhat referring to Figure 1.
Would you suggest removing it entirely or adding more information. E.g. 
   +---------------+
   |               |
   |               v
   |     cwin = cwin0 = LPP + 1
   |   +----------------------+
   |   | Congestion avoidance |
         cwin increases as LSPs are acked
   |   + ---------------------+
   |               |
   |               | Congestion signal
   ----------------+

> - cwin = LPP + 1: Why is LPP selected as the start/minimum value? Earlier on 
> you say that LPP must be equal or less than 90 and recommend a value of 15.
> These values seem already large.

[Bruno] The receiver will not acknowledge anything before LPP LSPs are sent 
(*). In the absence of feedback the sender has no feedback loop and not even 
information about RTT. So we may shape the LSP being sent, but we do need to 
send LPP LSP to get some feedback. +1 is too allow for a loss of LSP.
(*) well it will acknowledge after a delay (Partial SNP Interval) but the 
suggested value is 200ms which seems significant and some implementation may 
not advertise this sub-TLV in which case default IS-IS value is in seconds. 

Retrospectively, we could have requested the received to quickly acknowledge 
the first received LSP (e.g., starting with LPP=1 and then increasing it). But 
we didn't and this may be seen as too late for a change. Still, adding this 
would not make existing implementation not compliant and may improve future 
evolutions. So I would welcome your feedback on this.
Also this argument is only applicable for the first iteration of congestion 
avoidance. Subsequent ones could have use information from the past.

Regarding cwin of 16, yes that's a significant start, but we know that the 
single link can largely handle this (>100Gbit/s), that diffserv Cos is enabled 
if needed, that some existing implementations blindly sends an initial burst of 
5 to 10 LSPs already (e.g., 10 for Cisco IOS "default optimized enabled" which 
is already years old)
https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwjNsuOpxZSEAxVHVaQEHVHuAhQQFnoECA4QAQ&url=https%3A%2F%2Fwww.cisco.com%2Fc%2Fen%2Fus%2Ftd%2Fdocs%2Fios-xml%2Fios%2Fseg_routing%2Fconfiguration%2Fxe-17%2Fsegrt-xe-17-book-cat8000%2Fsr-fast-convergence-default-optimize.pdf&usg=AOvVaw2phvrUtDUS1MdlO4E2eRZt&opi=89978449
 , that we are talking with a "carrier grade" router with a significant price 
tag and an assumed significant performance.
Even the 25 years old OSI specification allows for an initial burst of 10 
back-to-back LSP.

That being said, if we feel that this is not appropriate, the expertise is on 
you side, so please say so.

> Section 6.2.2.2:
> “This value should include a margin of error to avoid false positives (e.g., 
> estimated MAT measure variance) which would have a significant impact on 
> performance.”
> 
> First, you call the first congestion signal “Timer” however I think it should 
> be call “Loss” where the loss detection algorithm you are proposing is based 
> on a timer. In TCP the retransmission (and therefore loss detection) timer is 
> initially set to 3xRTT and then adapted with more measurements (see RFC6298).
> The text above, however, seems really too vague to implement that right. I 
> guess you can take the simple approach and just set it to 3xRTT. However, 
> given the delays on a point-to-point link are really small, I’m not sure a 
> timer based on RTT is useful at all. Is the system even able to maintain a 
> timer with that granularity? My understanding, however, is that LSP has a way 
> to detect loss already in order to retransmit, therefore it would make more 
> sense simply always reset the cwin when you retransmit a PDU. Or how does the 
> LSP decide to send a retransmission?

[Bruno] Agreed with the name "Loss".
I'm personally fine with the proposed simple way of using 3*RTT. What about 
also referring to RFC6298?
Note that in our case, the timer is only used for detecting congestion. 
Retransmission uses a much larger timer (and is unchanged by this document)

Regarding link delay, within a dense area link delay would typically be 
"negligible". In WAN area, e.g., in France, link delay would easily be in the 
1ms - 10 ms range. Crossing the Atlantic Ocean is larger. Also, the RTT 
includes the time for IS-IS to acknowledge. I would assume that this may be 
significant (10 ms or may be more than 100ms) in some conditions such as if the 
routing process is handling both IS-IS flooding and BGP computations.)
In term of system timer granularity, this seems very implementation and 
condition specific.

Guillaume, please disagree if needed.

> 
> “Reordering: a sender can record its sending order and check that 
> acknowledgements arrive in the same order as LSPs. This makes an additional 
> assumption and should ideally be backed up by a confirmation by the receiver 
> that this assumption stands.”
> 
> Regarding re-ordering as an input: if a packet if “just” re-ordered but still 
> received, it should not be taken as a congestion signal.  However, can that 
> even happen on a point-to-point link? If you mean the packet was never 
> received and there is a gap in the packet number, that’s again called loss 
> and not reordering, but simply using a packet number based detection 
> mechanism instead of a timer. However, based on the description above it is 
> not fully clear to me how you think this would work and what you mean by 
> “additional assumption”…? I think you further need to clarify this.

[Bruno] IS-IS has no notion of packet number nor re-ordering. The receiver may 
acknowledge the LSP in any order.  The receiver acknowledges set of LPP LSPs 
and ordering in that set does not matter.
In addition, we don't want to change the way LSP are retransmitted. So we are 
ready to make a distinction between the loss message (e.g. not acknowledged 
within 5 seconds) and the congestion message (e.g., not acknowledged in order).
With regards to point to point interface, clearly the fiber will not reorder 
packet. There could be multiple Ethernet member "below" the IP layer but AFAIK 
a given flow should aways be sent on the same Ethernet member. (here flow would 
be based on the group mac address as IS-IS do not use IP). 

Trying to clarify
OLD:  Reordering: a sender can record its sending order and check that 
acknowledgements arrive in the same order as LSPs. This makes an additional 
assumption and should ideally be backed up by a confirmation by the receiver 
that this assumption holds. The O flag defined in Section 4.4 serves this 
purpose.
NEW: Reordering: if the receiver has signaled the O-flag (Ordered 
acknowledgement) (Section 4.4 ) a sender MAY record its sending order and check 
that acknowledgements arrive in the same order. If not this MAY be used to 
trigger a congestion signal.

> Sec 6.2.2.3:
> 
> You call this refinement “Fast recovery” which I guess is inspired by TCP.
> However, it’s a bit confusing because in TCP i’st a different algorithm. In 
> TCP’s fast recovery, you do not reduce your congestion window to the initial 
> value but only halve it (or decrease by a different factor depending on the 
> congestion control algorithm). This is done if only some packets are lost 
> while most packets still arrive. In TCP, resetting to the initial value only 
> happens if a timeout occurs, meaning no data/acks have arrives for some time.

[Bruno] Guillaume, any feedback on this?
One option is to remove those 2 refinements.
That being said, I feel like the text is readable even if it does not match TCP 
algorithm. If the name brings confusion it can be easily changed.


> Sec 6.2.2.4:
> “The rates of increase were inspired from TCP [RFC5681], but it is possible 
> that a different rate of increase for cwin in the congestion avoidance phase 
> actually yields better results due to the low RTT values in most IS-IS 
> deployments.”
> 
> I don’t think this is really a refinement but rather some consideration.
> However, saying this without any further guidance, doesn’t seem really help 
> or even harmful.

[Bruno] OK so this section could be removed.
 
> However, more generally, all in all I’m also not sure a fine-grained 
> congestion control is really needed on a point-to-point link as there is only 
> one receiver that could be overload and in that case you should rather adapt 
> your flow control. I think what you want is to set the flow control parameter 
> in the first place in a reasonable range.

[Bruno] I fully agree with you. We have first implemented flow control and had 
very good results with it (no loss of LSP, very good adaptation of the sender 
rate to the receiver rate). We really had to purposely create congestion in 
order to have some congestion.
However, although there is a point to point link between adjacent routers, 
within the receiving routers there is typically a "switch" or a possible 
congestion between the N high speed incoming interfaces and the control plane 
(host). The feedback we received from multiple implementors is that the 
internal are platform dependent and touchy. They do experience some congestion 
on this path.
So we added congestion control and this does improve the situation I case of 
congestion.

> If then there is actual congestion for some reason, meaning you detect a 
> constant or increasing loss rate, maybe you want to implement some kind of 
> circuit breaker by stop sending or fall-back to a minimum rate for a while 
> (e.g. see RFC8085 section 3.1.10.). However why would that even happen on a 
> point-point-link?

[Bruno] falling back to a minimum rate (current rate actually 😉) seems indeed 
like a simple and pragmatic solution. However if congestion happen frequently, 
this would significantly regrade performance. On the other hand, we found that 
reducing cwin was typically helpful. 
So may be we could remove both refinements and add that minimally the sender 
can falls back to the current rate of one LSP every 33 ms. (or whatever value 
used before this document)

> 
> Sec 6.2.3
> “Senders SHOULD limit bursts to the initial congestion window.“ I don’t think 
> this is a requirement because based on your specified algorithm this happens 
> automatically. The initial window is LPP which is also the max number of PDUs 
> that can be asked in on PSNP thus this is also the maximum number of packets 
> you can sent out in receipt go the PSNP (given you don’t let the cwin growth 
> beyond what’s ready to send). However, you have this new parameter for the 
> LSP burst window. That’s what should limit the burst size (and be rather be 
> smaller than LPP/the initial window). 

[Bruno] OK. 
So is you proposal the below change?
OLD: Senders SHOULD limit bursts to the initial congestion window. A sender 
with knowledge that the receiver can absorb larger bursts, such as by receiving 
the LSP Burst Size sub-TLV from this receiver may use a higher limit.
NEW: Senders SHOULD limit bursts to LSP Burst Size.

>Also pacing is used over Internet link to avoid overloading small buffer on 
>the path, as the buffer size of the network element is unknown. This is not 
>the case in your point-to-point scenario. If you know all buffer sizes, it 
>probably sufficient to limit the burst size accordingly.

[Bruno] Agreed but the capacity and behavior of the "internal switch" in the 
receiving router is unknown in general and people don't want to know it to 
allow control plane implementation to be platform independent.
Also a receiver may be connected to 100 other IS-IS routers sending their LSPs 
and the buffer size may be shared by all senders. So it seemed like pacing 
would be a good thing in the general case.

> 
> Sec 6.2.4
> “If the values are too low then the transmitter will not use the full 
> bandwidth or available CPU resources.” As a side note, I hope that fully 
> utilising the link or CPU just with LSP traffic is not the goal here. Which 
> is another reason my you might not need a fine-grained congestion control: 
> congestion has two goals, avoid congestion but also fully utilise the link -> 
> I think you only need the former.

[Bruno] You are correct that fully utilizing the link or the CPU is not the 
goal. I would even say that fully utilizing the link would be well above the 
capability of IS-IS. But in our tests we achieved fully utilizing the CPU on 
the receiver when the used 10 senders in parallel. So essentially each sender 
was using 10% CPU of the receiver. But this included all the extra work done by 
IS-IS. And we had no loss of LSPs and the rate of each sender was controller by 
the rate of the receiver (just by using flow control, so no magic but still 
very efficient)

> 
> Sec 6.3
> This section is entirely not clear to me. There is no algorithm described and 
> I would not know how to implement this. Also because you interchangeably use 
> the terms congestion control and flow control. Further you say that no input 
> signal from the receiver is needed, however, if you want to send with the 
> rate of acknowledgement received that is an input from the receiver. However, 
> the window based TCP-like algorithms does actually implicitly exactly that: 
> it only send new data if an acknowledgement is received. It further also 
> takes the number of PDUs that are acknowledged into account because that can 
> be configured. If you don’t do that, you sending rate will get lower and 
> lower.

[Bruno] I agree with you.
That section served two purposes:
-  to indicate that the signaling specified on the receiver side (defined in 
sections 4 and 5) may actually be used by different congestion control 
algorithms.
- as the result of a compromise when we had to merge two competing proposals at 
the time of WG adoption.

Les and Marek, it's up to you to reply on your section.

Alternatively, the last paragraph of section 6.1 "Overview" probably already 
covers the first objective and to be transparent, the sentence is inspired from 
RFC 9002 section 7. (QUIC Loss Detection and Congestion Control).
And for the second objective, the text could possibly be moved to another 
document. But we would probably need WG feedback on this.

 
> Some small nits:
> - Sec 4: advertise flooding-related parameters parameters -> advertise 
> flooding-related parameters - Sec 5.1: PSNP PDUs -> PSNPs or PSN PDUs - Sec
> 5.2: Sequence Number Packets (SNPs) -> probably: Sequence Number PDUs (SNPs)? 
> - Sec 6.2.1.1.: leave space for CSNP and PSNP (SNP) PDUs -> leave space for 
> CSNPs and PSNPs  ?

[Bruno] Thank you for your careful review (of ISO terms...). Corrected.


Again thank you very much for your careful review. Much appreciated and much 
useful.
I'll upload a new version of the draft once we have converged or whenever you 
believe that this would help (e.g., when you ack a significant set of the 
proposed changes).

Best regards,
--Bruno

 
> 
>
____________________________________________________________________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.
_______________________________________________
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr

Re: [Lsr] Tsvart early review of draft-ietf-lsr-isis-fast-flooding-06

Reply via email to