HI Linda: My replies in line marked with [DA]. Reverting to plain text..
Dave's issue 1) The first justification example in the motivation section is just plain silly, yet I keep seeing it over and over in IETF discussions. No one would ever design and implement the deployment of any single application that way, let alone all apps attached to every TOR in a cloud. [Linda] The motivation is showing that today's server virtualization and business demand has made applications in one subnet no longer necessarily be placed closely together. They are the same driver for NVo3. If you go to this week's VMWorld, you will see more. The motivation doesn't say anything along the line of "all apps attached to every TOR in a cloud". [DA] IMO I might like the flexibility to put anything anywhere and not have to think about it, but I am introducing latency and degrading scalability and consuming excessive system bandwidth when I do so, so I will likely think about it anyway and collocate communicating VMs as much as possible. [DA]And my summarization of the example could use a rephrasing, I assumed if I had one TOR for which every VM instance was part of a unique non-overlapping subnet, I would have others....which is what I actually meant. Not an app had an end point attached to every TOR... I hope this is clearer. VM placement affinity rules would NEVER see absolutely every local VM belonging to a different tenant network and requiring connectivity only to remote VMs. [Linda] "VM placement affinity rule" is set by operators. One operator can set the rule of not allowing different subnets to be placed under one rack, by sacrificing the flexibility of placement. Other operators may prefer the flexibility by setting more lose rules. [DA] I know which one I would contract with. I would co-locate VMs as much as possible within the limits of my failure risk constraints, and as an operator would mix very risk constrained VMs with those less constrained in common equipment exactly to avoid the pathological case outlined in motivations. Which says to me this is a solution to a non-problem unless the goal is to circumvent near-malicious stupidity in the design of the provisioning algorithms used in the cloud management system. [Linda] You can still implement the rule of requiring hosts of one subnet being placed together under one ToR. Then there is no problem. But it doesn't mean that we don't need to evolve network to enable operators to utilize the new technology for flexibility and elasticity. [DA] If flexibility means latency and needing much bigger switches. go for it and let me know how it turns out ;-) Dave's issue 2) I believe the notion of an ARP proxy and use of MAC-NAT for scaling are divisible concepts and each can be examined in isolation. [Linda] Agree with your point here. The draft really has two portions: Proxy Gateway and caching the entries. [DA] Great. a. If I want an ARP proxy system, Ethernet already defines the capability to map frames with a specific Ethertype to a distinct VLAN, so ARP transactions (0x806) can be separated out of the datastream using existing standardized technology and delivered to a remote proxy ARP system of any arbitrary design, no further standardization required. [Linda] There can be multiple VLANs hosts (applications) placed on the rack. Do you mean giving them different EtherTypes? So the traffic is mapped to another VLAN? Does it mean each subnet will need two VLANs? [DA] I was not planning on designing such a system off the top of my head, merely expose that such capability existed, but if I actually needed a proxy ARP system and implemented it this way, yes it would a distinct VID separate from the regular dataplane for connectivity to the ARP proxy from which the ARP system could infer the specific tenant subnet. [Linda] Besides, you can't dictate on what EtherType to use for applications being instantiated on servers. [DA] I wasn't.. bog standard 0x806 Can you elaborate more of this "ARP Transactions"? If it works well, we can document it in the draft and replace the ARP proxy being currently described. [DA] ARP request/reply is what I meant. That's all.I'm not sure it replaces the proxy. And on first blush I don't think the way it works in the draft can be easily replaced, as the ARP needs to resolve to the egress MAC-NAT function for the rest of it to work. b. An alternative to MAC NAT (a form of summarization in a flat dataplane) is with MACinMAC (a form of summarization via hierarchical stacking). [Linda] MACinMAC (IEEE802.1ah) doesn't summarize all remote hosts with common MAC. [DA] Excuse me.? It encapsulates frames from a remote BEB with a header that includes the BMAC of the remote BEB which is what appears in all core forwarding tables. If you want to call it something else fine, but it is functionally equivalent. All switches and hosts attached to the boundary Edge and the Backbone Edge are exposed to all the remote hosts' MAC/VLAN addresses. [DA] Ahhh, you are worried about compressing the MAC state in the ingress TOR and replacing it with state in the egress in a new function requiring a technology change and a separate table. I'd concede you will get some benefit in theoretical scalability, but individual TOR chips these days can handle north of 250000 MAC addresses in on chip tables, so again unless you are focused on an extreme corner case which I do not subscribe to , I do not see an issue with current technology Which is already standardized and deployed and would have identical scaling properties. NVO3 class of solutions while not yet standard, have similar properties. c. MAC NAT itself means the e2e path has a pseudo L3 hop in it, which effectively prevents all L2 protocols (e.g. CFM, DCB etc.) from being able to instrument connectivity e2e, so (for example) to diagnose what looks like an L2 hop, I would need to use both ETH-LB and ICMP ping, even then with questionable ability to fault sectionalize after the MAC NAT function. [Linda] CFM is among switches, not among end stations. The proxy Gateway is to represent all remote end-stations (or hosts) by their Gateway addresses. Switch addresses are never an issue because there are only very limited number of switches in the network. Can you elaborate this "pseudo L3 hop"? ETH-LB and ICMP are also for switch nodes too. They are not for end stations. [DA] What I mean by a pseudo hop is that the MAC-NAT function receives a frame, and needs to resolve the actual MAC address for the End station. As it summarizes (many end stations are compressed to a single MAC address of the NAT function) it is not a clean pass though for any L2 frames. There is an L3 to MAC resolution step in the middle of the path. [DA] I would consider a hypervisor vSwitch to be a valid MEP in any operational system and would expect testability to it, which is not an L3 addressed end point. I do not have this with the SARP proposed solution. I hope this clarifies my issues! cheers Dave ---------------------------- Hi all, This draft has been presented at several intarea face to face meetings and has received quite a bit of discussion. It has been difficult to gauge whether the wg is interested in this work or not. This call is being initiated to determine whether there is WG consensus towards adoption of draft-nachum-sarp-06 as an intarea WG draft. Please state whether or not you're in favor of the adoption by replying to this email. If you are not in favor, please also state your objections in your response. This adoption call will complete on 2013-09-04. Regards Suresh & Julien _______________________________________________ Int-area mailing list Int-area@ietf.org https://www.ietf.org/mailman/listinfo/int-area