Inline ... -Jon
On 31/07/14 09:48 +0000, Vitkovský Adam wrote: > Hello Peter, > > I'd like to ask couple of questions regarding the design and confirm my > understanding please. > > > What is the recommended fan out ratio for Tier3 to Tier2 and Tier2 to Tier1 > please? > Tier3 to Tier2 would be 1/4 (so in case one Tier2 device fails the remaining > BW still available for a cluster is 75%) (Tier3 device has 4 ECMP paths). > Tier2 to Tier1 would be 1/2 (so in case one Tier1 device fails the remaining > BW still available for cluster is 87.5%) (Tier2 device has 2 ECMP paths). > > Or is it more like 1/32 for the Tier3 to Tier2 (Tier3 device has 32 ECMP > paths) please? This is highly dependent on maximum size (# of clusters) in the DC and amount of traffic required to be carried between tiers. YMMV. If Tier3 in your design is a 64 port TOR, it may be a bit extreme to utilize half the ports as uplinks, although certainly it is possible. Most cluster designs may choose 1/4 or 1/8 but higher is certainly possible. Making a recommendation that applies to all DC size and traffic requirements is probably not feasible. > > > 8.2.1. Collapsing Tier-1 Devices Layer. > - I think that as a result of collapsing the number of Tier1 devices to a > 1/2 the impact of the failure of a single Tier1 device will increase in 50%. > - Thus wouldn't it be more desirable to leave the same number of Tier1 > devices and only add links from a particular tier2 device to > another/neighboring pair of Tier1 devices please? > - The reduction in port capacity would remain the same. > - However the impact of a failure of a single link or single Tier1 device is > unchanged. > > > > 8.2. Route Summarization within Clos Topology. > - Since you have mentioned that all the devices are preferably of the same > type to accommodate REQ2. > - I'm thinking they would probably have the same FIB capacity right please? > - So if Tier1 device can hold all the DC routes than Tier2 and Tier3 devices > can as well right? > - If the FIB size differs between the devices used in various Tiers than > summarization is beneficial indeed. I think you read these sections as disjoint. The only purpose of section 8.2.1 is if you desire to do summarization in the Clos, most operators may agree the trade-off is not worth it and not summarize in the fabric. These operators may also have the same FIB capacity on all devices. Other operators may desire summarization in the Clos due to not selecting devices of same FIB capacity or wanting to reduce the control plane exposure as suggested in section 8.2. 8.2.1 explains one way that could be done and the associated trade-offs of doing it. > > If FIB size is a cause for concern would it be possible to utilize a scheme > where servers are grouped into server groups, then to define which server > groups need to communicate with other set of server groups or everybody or > the internet please? > This way prefixes could be marked and filters on Tier3 and Tier 2 devices set > accordingly -to only allow the necessary prefixes to be accepted form a Peer > or inserted from BGP into RIB/FIB. > Drawback is of course the increased operational complexity maintaining > filters as well as troubleshooting. > Though with clear server groups to Tier3 devices (or clusters) mapping scheme > the filters would be set once than maintained only occasionally. > Also with a clear communities scheme troubleshooting would be straight > forward I believe. > > I'm thinking like in MPLS VRFs if a particular PE (Tier3 device) is serving > only a subset of VRFs(server groups) it doesn't really need to hold all the > DC routes. Implicit in the requirements is full reachability to server subnets in the design (from every other server subnet and typically with default providing external connectivity to the CloS as outlined). If this is not a requirement, an alternative would be not building a large scale CloS network but rather build a number of small scale CloS networks that are custom built to such server groups. Obviously this limits the fungability of the equipment deployed. Also, it should be stated that operational simplicity is a stated goal of this design. > > > 8.3. ICMP Unreachable Message Masquerading. > Another option is to make the network device perform IP address > masquerading, > - Does that mean the network device will respond with RID/Loopback IP during > traceroute please? > - If so it would be than impossible to pinpoint the link used to forward > traffic to the next hop so if there are two IP paths between directly > connected devices we wouldn't be able to distinguish the failed one. > - But I guess this kind of setup is not going to be used. Yes, typically there is only one connection in the design between two specific devices in different Tiers, so if the previous hop responded and the device respond where TTL is exceeded has source of RID, this is effectively identifying the link. Sometimes there is more than one link, but in those cases, at times the operator may be using LAG, making this valid as well. All of section 8 is options to the design - if the specific design an operator chooses has multiple non-LAG links between two device in seperate tiers, and traceroute response is deemed highly useful, they may opt for the second option of section 5.2.3 rather than icmp masquerading. > > > > And just some nit picking. > 7.1. Fault Detection Timing. > This feature is sometimes called as "fast fallover" > - Do you mean the "fast external failover" as it only applies to eBGP > sessions? > - Or you mean the "fast peering session deactivation" functionality that > brings the same functionality to iBGP sessions please? It means the first, although more implementations use the word fallover not failover in the command. If it is confusing we can add external to the wording. This entire draft is about EBGP only design, so the second does not apply. _______________________________________________ rtgwg mailing list [email protected] https://www.ietf.org/mailman/listinfo/rtgwg
