Re: [Int-area] Call for adoption of draft-nachum-sarp-06.txt

David Allan I Thu, 29 Aug 2013 12:28:16 -0700

HI Linda:

My replies in line marked with [DA]. Reverting to plain text..


Dave's issue 1) The first justification example in the motivation section is 
just plain silly, yet I keep seeing it over and over in IETF discussions. No 
one would ever design and implement the deployment of any single application 
that way, let alone all apps attached to every TOR in a cloud. 

[Linda] The motivation is showing that today's server virtualization and 
business demand has made applications in one subnet no longer necessarily be 
placed closely together.  They are the same driver for NVo3. If you go to this 
week's VMWorld, you will see more. 
The motivation doesn't say anything along the line of "all apps attached to 
every TOR in a cloud". 

[DA] IMO I might like the flexibility to put anything anywhere and not have to 
think about it, but I am introducing latency and degrading scalability and 
consuming excessive system bandwidth when I do so, so I will likely think about 
it anyway and collocate communicating VMs as much as possible.  

[DA]And my summarization of the example could use a rephrasing, I assumed if I 
had one TOR for which every VM instance was part of a unique non-overlapping 
subnet, I would have others....which is what I actually meant. Not an app had 
an end point attached to every TOR... I hope this is clearer.

VM placement affinity rules would NEVER see absolutely every local VM belonging 
to a different tenant network and requiring connectivity only to remote VMs. 

[Linda] "VM placement affinity rule" is set by operators. One operator can set 
the rule of not allowing different subnets to be placed under one rack, by 
sacrificing the flexibility of placement.  Other operators may prefer the 
flexibility by setting more lose rules.

[DA] I know which one I would contract with. I would co-locate VMs as much as 
possible within the limits of my failure risk constraints, and as an operator 
would mix very risk constrained VMs with those less constrained in common 
equipment exactly to avoid the pathological case outlined in motivations.

Which says to me this is a solution to a non-problem unless the goal is to 
circumvent near-malicious stupidity in the design of the provisioning 
algorithms used in the cloud management system.

[Linda] You can still implement the rule of requiring hosts of one subnet being 
placed together under one ToR.  Then there is no problem. But it doesn't mean 
that we don't need to evolve network to enable operators to utilize the new 
technology for flexibility and elasticity. 

[DA] If flexibility means latency and needing much bigger switches. go for it 
and let me know how it turns out ;-)

Dave's issue 2) I believe the notion of an ARP proxy and use of MAC-NAT for 
scaling are divisible concepts and each can be examined in isolation. 

[Linda] Agree with your point here. The draft really has two portions: Proxy 
Gateway and caching the entries. 
[DA] Great.

a. If I want an ARP proxy system, Ethernet already defines the capability to 
map frames with a specific Ethertype to a distinct VLAN, so ARP transactions 
(0x806) can be separated out of the datastream using existing standardized 
technology and delivered to a remote proxy ARP system of any arbitrary design, 
no further standardization required.
[Linda] There can be multiple VLANs hosts (applications) placed on the rack. Do 
you mean giving them different EtherTypes? So the traffic is mapped to another 
VLAN? Does it mean each subnet will need two VLANs?

[DA] I was not planning on designing such a system off the top of my head, 
merely expose that such capability existed, but if I actually needed a proxy 
ARP system and implemented it this way, yes it would a distinct VID separate 
from the regular dataplane for connectivity to the ARP proxy from which the ARP 
system could infer the specific tenant subnet.

[Linda] Besides, you can't dictate on what EtherType to use for applications 
being instantiated on servers. 

[DA] I wasn't.. bog standard 0x806

Can you elaborate more of this "ARP Transactions"? If it works well, we can 
document it in the draft and replace the ARP proxy being currently described.

[DA]  ARP request/reply is what I meant. That's all.I'm not sure it replaces 
the proxy. And on first blush I don't think the way it works in the draft can 
be easily  replaced, as the ARP needs to resolve to the egress MAC-NAT function 
for the rest of it to work. 
 

b. An alternative to MAC NAT (a form of summarization in a flat dataplane) is 
with MACinMAC (a form of summarization via hierarchical stacking). 

[Linda] MACinMAC (IEEE802.1ah) doesn't summarize all remote hosts with common 
MAC. 

[DA] Excuse me.? It encapsulates frames from a remote BEB with a header that 
includes the BMAC of the remote BEB which is what appears in all core 
forwarding tables.  If you want to call it something else fine, but it is 
functionally equivalent.

All switches and hosts attached to the boundary Edge and the Backbone Edge are 
exposed to all the remote hosts' MAC/VLAN addresses.  

[DA] Ahhh, you are worried about compressing the MAC state in the ingress TOR 
and replacing it with state in the egress in a new function requiring a 
technology change and a separate table. I'd concede you will get some benefit 
in theoretical scalability, but individual TOR chips these days can handle 
north of 250000 MAC addresses in on chip tables, so again unless you are 
focused on an extreme corner case which I do not subscribe to , I do not see an 
issue with current technology 
 
Which is already standardized and deployed and would have identical scaling 
properties. NVO3 class of solutions while not yet standard, have similar 
properties.
c. MAC NAT itself means the e2e path has a pseudo L3 hop in it, which 
effectively prevents all L2 protocols (e.g. CFM, DCB etc.) from being able to 
instrument connectivity e2e, so (for example) to diagnose what looks like an L2 
hop, I would need to use both ETH-LB and ICMP ping, even then with questionable 
ability to fault sectionalize after the MAC NAT function.

[Linda] CFM is among switches, not among end stations. The proxy Gateway is to 
represent all remote end-stations (or hosts) by their Gateway addresses. Switch 
addresses are never an issue because there are only very limited number of 
switches in the network. 
Can you elaborate this "pseudo L3 hop"?  ETH-LB and ICMP are also for switch 
nodes too. They are not for end stations. 

[DA] What I mean by a pseudo hop is that the MAC-NAT function receives a frame, 
and needs to resolve the actual MAC address for the End station. As it 
summarizes (many end stations are compressed to a single MAC address of the NAT 
function) it is not a clean pass though for any L2 frames. There is an L3 to 
MAC resolution step in the middle of the path.

[DA] I would consider a hypervisor vSwitch to be a valid MEP in any operational 
system and would expect testability to it, which is not  an L3 addressed end 
point.  I do not have this with the SARP proposed solution.

I hope this clarifies my issues!
cheers
Dave


----------------------------
Hi all,
  This draft has been presented at several intarea face to face meetings
and has received quite a bit of discussion. It has been difficult to
gauge whether the wg is interested in this work or not. This call is
being initiated to determine whether there is WG consensus towards
adoption of draft-nachum-sarp-06 as an intarea WG draft. Please state
whether or not you're in favor of the adoption by replying to this
email. If you are not in favor, please also state your objections in
your response. This adoption call will complete on 2013-09-04.

Regards
Suresh & Julien
_______________________________________________
Int-area mailing list
Int-area@ietf.org
https://www.ietf.org/mailman/listinfo/int-area

Re: [Int-area] Call for adoption of draft-nachum-sarp-06.txt

Reply via email to