On 2019-01-22 12:02 MET, Pavel Lunin wrote: >> (I am myself running a mostly DC network, with a little bit of campus >> network on the side, and we use bandwidth-based metrics in our OSPF. >> But we have standardized on using 3 Tbit/s as our "reference bandwidth", >> and Junos doesn't allow us to set that, so we set explicit metrics.)
> As Adam has already mentioned, DC networks are becoming more and more > Clos-based, so you basically don't need OSPF at all for this. > > Fabric uplinks, Backbone/DCI and legacy still exist though, however in > the DC we tend to ECMP it all, so you normally don't want to have unequal > bandwidth links in parallel in the DC. Our network is roughly spine-and-leaf. But we have a fairly small net (two spines, around twenty leafs, split over two computer rooms a couple of hundred meters apart the way the fiber goes), and it doesn't make economical sense to make it a perfectly pure folded Clos network. So, there are a couple of leaf switches that are just layer 2 with spanning tree, and the WAN connections to our partner in the neighbouring city goes directly into our spines instead of into "peering leafs". (The border routers for our normal Internet connectivity are connected as leafs to our spines, though, but they are really our ISP's CPE routers, not ours.) Also, the leaves have wildly different bandwidth needs. Our DNS, email and web servers don't need as much bandwidth as a 2000 node HPC cluster, which in turn needs less bandwidth than the storage cluster for LHC data. Most leaves have 10G uplinks (one to each spine), but we also have leafs with 1G and with 40G uplinks. I don't want a leaf with 1G uplinks becoming a "transit" node for traffic between two other leafs in (some) failure cases, because an elephant flow could easily saturate those 1G links. Thus, I want higher costs for those links than for the 10G and 40G links. Of course, the costs don't have to be exactly <REFERENCE_BW> / <ACTUAL_BW>, but there need to be some relation to the bandwidth. > Workarounds happen, sometimes you have no more 100G ports available and > need to plug, let's say, 4x40G "temporarily" in addition to two existing > 100G which are starting to be saturated. In such a case you'd rather > consciously decide weather you want to ECMP these 200 Gigs among six > links (2x100 + 4x40) or use 40GB links as a backup only (might be not > the best idea in this scenario). Right. I actually have one leaf switch with unequal bandwidth uplinks. On one side, it uses 2×10G link aggregation, but on the other side, I could use an old Infiniband AOC cable giving us a 40G uplink. In that case, I have explicitly set the two uplinks to have the same costs. /Bellman, NSC
signature.asc
Description: OpenPGP digital signature
_______________________________________________ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp