Hi Robert, >> I would definitely like to understand it. To my mind, link/node failures >> have at least an area-wide impact. The scope of a congestion event and >> failures are extremely similar to my mind. > > That is 100% correct that link/node failures have an area wide impact. > > But I am not talking about impact, but locality of the trigger. > > I think you can agree that link/node failure will be detected by directly > attached PLRs only. My point is that congestion is on the other hand caused > by traffic flow(s) which are not sourced in P node(s) in the middle of the > network. They usually enter the network from the edge (ingress) and go to the > the other side (egress).
Agreed. How is this relevant? >> As discussed in the presentation and in the draft, when a prefix is >> activated, TTE shifts the backup paths to be in ECMP with the primary path. >> This shifts a portion of the traffic for the affected prefix onto the bypass >> path. > > Ahh ok so you call a backup path an ECMP path with primary path (or primary > ECMP paths). Ok. I would rather call them bypass but no issue here. The semantics are the same, regardless of the name. >> Yes, we are only discussing unicast. Congestion can be a local phenomenon. >> 101Gb of traffic funneled into a 100Gb link that drops the excess 1Gb of >> traffic will effectively ‘protect’ the downstream 100Gb links. > > Do you plan to support PE-CE link congestion by use of PE-CE protection in > case of multihomed customer sites ? We support this on all links. >> Exactly. Since repair requires that you provision bandwidth for failures, >> their assumption is that bypass links are not completely congested. If all >> of your links are totally saturated, then FRR does not help at all and >> neither does TTE. That’s not a use case that’s interesting to address. > > My point is not about capacity planning during network design. We are already > talking about case where our provisioning assumptions are gone. So if primary > link got congested there zero assurance that backup links at the time of > bypass activation have space for excessive traffic. As I’ve tried to explain: TTE cannot create bandwidth. In such a case where the network is uniformly congested, TTE cannot address the issue. That’s not it’s use case. No mechanism is going to be able fix that: if the load exceeds the capacity of the network, even an oracle is not going to provide path placement that avoids loss. >> Please recall that there are two thresholds: high and low. Activating a >> prefix may shift some traffic to the bypass. Typically, we would expect that >> after a few prefixes are shifted, enough load would be shed so that >> utilization lies between the low and high thresholds. >> >> TTE is iterative and continuous: if flow selection does not alleviate >> congestion, more flows will be selected in the next iteration. Similarly, if >> flow selection overshoots, it will self-correct by deactivating prefixes >> until utilization lies between thresholds. > > Prefix or flow ? Sorry, I tend to be a little bit haphazard in my terminology. TTE activates specific prefixes and labels. Upon activation of the backup path, it forms (or adds to) an ECMP group for the prefix or label. This ECMP group will direct some flows on the backup path. Strictly speaking, we should be talking about prefix (and label) selection. > Are you just modifying rewrite for a prefix(es) or applying more flow ACL to > select what goes into bypass ? At this time, we select prefixes (filtered by policy) and form an ECMP group. The usual mechanisms for ECMP hash bucket selection will apply. > If you are just doing this based on dst prefix then I could see how it could > work for pure Internet transit where there is no encapsulation used in the > network. > > But such networks (even for pure ISP) are in vast majority a history. For > various reasons most networks use PE-PE encapsulation of some sort. Take MPLS > (LDP or SR) ... Node will be receiving packets with label L - prefix on the > packet plays no role in forwarding here. Conceptually, this can also be done on MPLS LSPs equally. Entropy labels are recommended. > So you need quite deep ACL to go beyond MPLS header (even without MNA) to > recognize the flow. Needless to say you need pretty powerful local s-flow > capabilities to recognize those flows in the first place. We’re not doing that at this point, just prefix or label granularity. > If not then you are likely not going to shift excessing 1 GB of 101 GB demand > but perhaps 30-60 GB If the only traffic on the link are elephant flows without any kind of entropy or distribution, then TTE is not recommended. The prefix/label selection algorithm at this point is random and relies on the Law of Large Numbers to select prefixes/labels. Pragmatically that suggests that a link should be carrying at least 50 flows with a Gaussian distribution of bandwidth demands. Tony
_______________________________________________ rtgwg mailing list [email protected] https://www.ietf.org/mailman/listinfo/rtgwg
