TLDR; metrics aren't a purely design/academic decision, they are operational too.
On Thu, 24 Jan 2019 at 09:27, Saku Ytti <s...@ytti.fi> wrote: > I don't disagree, I just disagree that there are common case where > bandwidth is most indicative of good SPT. If by "good" you mean "shortest" (least number of hops) then I disagree with you, bandwidth is usually indicative of shortest number of hops (not always but usually). In any reasonable hierarchical design northbound links aren't going to be of a lower speed than southbound links. Taking Adams example of a folded Clos network as a theoretical utopian text-book example, you also wouldn't have east-west links between leaves and if you did they wouldn't be as fast or faster than your northbound links. The problem is that in reality no SP network looks as neat and tidy or simply as a Clos network, see below.... > Consider I have > > 10GE-1: > PE1 - P1 - P2 - P3 - P4 - P5 - P6 - P7 - P8 - PE2 > > 10GE-2: > PE1 - P1 - P2 - P3 - P4 - P5 - P6 - P7 - P8 - P9 - PE2 > > 10GE-3: > PE1 - P1 - P2 - P3 - P4 - P5 - P6 - P7 - P8 - P9 - P10 - PE2 > > 1GE: > PE1 - PE2 > > In which realistic topology > a) in 10GE-1 + 1GE, I want to prefer the 10GE between PE? As soon as you have 1.000000001Gpbs of traffic to shift (see my previous email). And this is where reality kicks in - why would you have a PE with a 10G and 1G uplink? In the hypothetical Clos design you simply wouldn't have mixed speed links facing northbound, in the real SP networking world you wouldn't have a 10G uplink if you didn't have >1Gbps of provisioned downstream connectivity, otherwise you're wasting capex/opex (except for rare circumstances like a carrier promotion selling 10G for the price of 1G or something, but you probably hadn't planned for that). So, assuming there is a reason you have bandwidth asymmetrical uplinks in your topology its probably downstream bandwidth related. It could also be upstream relted though; upstream link upgrades don't happen in a fixed time or perfectly symmetrically, maybe the road cloure is delayed, route planning changes, PoP closure, transmission equipment upgrade, you end up upgrading one northbound circuit in 3 motnths and the other takes 12 months. To go full circle to your original point bandwidth is dictating the "best" SPT here where "best" means "to avoid congestion during normal operations, not times of excepional operations which is when we look to QoS for help". This is what happens in the "real world" and not Clos networks. We might want diverse connections to a remote PoP and only one carrier has 10G of capacity there, so our backup link has to be 1G. We actually have more than 1G of provisioned downstream connectivity but that is all we can get unless we want 2x10G from the same carrier and no resilience. Maybe we can bond a few 1G links from the 2nd carrier and have 10G + 5G backup. To be clear I don't approve of such a design, my point is that in the real world, where things aren't simple, circuit costs are higher than expected, we don't have enough 100G or 10G ports, the project has been under budgeted, the lead time on the new router from vendor is 12 months not the promised 3, we end up with these kinds of weird asymmetrical topologies and we have to use a bandwidth based metric to route traffic. > b) in 10GE-2 + 1GE, I want to balance between the paths So, from a purely technical perspective, if you did per flow load balancing it would work. Should you do it? I'd say Hell no. But not because of anything to do with IGPs. The operational complexity of troubleshooting such a topology is too high in this scenario; Imagine if each one of those 10G links between P nodes was from a different carrier it would be a case of service credits lining ready to be given away. > c) in 10GE-3 + 1GE, I want to prefer the 1GE You actually have some bandwidth critical services which are <= 1Gbps. > All these seem nonsensical, what actually is meant '1GE has role Z, > 10GE has role X, have higher metric for role Z', regardless what the > actual bandwidth is. I just happens that bandwidth approximates role > in that topology, but desired topology is likely achieved with > distance vector or simple role topology and bandwidth is not relevant > information. To me they aren't nonsensical, they are "not ideal" for a specific purpose i.e. sub-optimal for latency, or operationally more complex. Going right back to basics; the reason we have a metric at all in the IGP is because there is some reason why the shortest path (number of hops) from A to B isn't the most optimal path, so we're using the metric as a weight to influence the SPT calculation. So the question is why isn't the STP optimal for you? In the hypothetical Clos model it is, in real life it isn't, so we're always trying to get as close to that as we can. Metrics aren't just a purely design/academic decision (function based or role based), they are operational too; e.g. breaking up a failure domain or breaking up a change request domain. I've had to move traffic away from a P/PE node because traffic around the core ring was disproportionately distributed such that the failure of one P node had a much larger impact that other P nodes. As I mentioned in my previous email, these issues only go away when you have the kind of luxuries that I, and I expect you, have like your own dedicate transmission network or enough influence to tell a carrier where to lay fibre next. Cheers, James. _______________________________________________ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp