Re: [MERGE] network-guru-orchestration into master

Pedro Roque Marques Fri, 01 Nov 2013 09:17:44 -0700

Darren,

On Oct 31, 2013, at 10:05 AM, Darren Shepherd <[email protected]> 
wrote:


> Yeah I think it would be great to talk about this at CCC.  I'm
> hesitant to further narrow down the definition of the network.  For
> example, I think OpenStack's Neutron is fundamentally flawed because
> they defined a network as a L2 segment.

OpenContrail implements a Neutron plugin. It uses the Neutron API to provide 
the concept of a virtual-network. The virtual-network can be a collection of IP 
subnets that work as a closed user group; by configuring a network-policy 
between virtual-networks the user/admin can define additional connectivity for 
the network. The same functionality can be achieved using the AWS VPC API. We 
have extended the Neutron API with the concept of network-policy but have not 
changed the underlying concept of network; the 1.00 release of the software 
provides an IP service to the guest-only (the latest release provides fallback 
bridging for non-IP traffic also). While i don't have a firm opinion on the 
Neutron API, it does not limit the network to be an L2 segment.

> In the world of SDN, I think its even more important to keep the
> definition of the a network loose.  SDN has the capability of
> completely changing the way we look at L2 and L3.  Currently in
> networking we group things by L3 and L2 concepts as that is how
> routers and switches are laid out today.  As SDN matures and you see
> more flow oriented design it won't make sense to group things using L2
> and L3 concepts (as those become more a physical fabric technology),
> the groups becomes more loose and thus the definition of a network
> should be loose.

I don't believe there is an accepted definition of SDN. My perspective and the 
goal for OpenContrail is to decouple the physical network from the service 
provided to the "edge" (the virtual-machines in this case). The goal is to 
allow the physical underlay to be designed for throughput and high 
inter-connectivity (e.g. CLOS topology); while implementing the functionality 
traditionally found in an aggregation switch (the L2/L3 boundary) in the host.

The logic is that to get the highest server utilization one needs to be able to 
schedule a VM (or LXC) anywhere in the cluster; this implies much greater data 
throughput requirements. The standard operating procedure used to be to aim for 
I/O locality by placing multiple components of an application stack in the same 
rack. In the traditional design you can easily find a 20:1 over-subscription 
between server ports and the actual throughput of the network core.

Once you spread the server load around, the network requirements go up to 
design points like 2:1 oversub. This requires a different physical design for 
the network and makes it so that there isn't a pair of aggregation switches 
nicely positioned above the rack in order to implement policies that control 
network-to-network traffic. This is the reason that OpenContrail tries to 
implement network-to-network traffic policies in the ingress hypervisor switch 
and forward traffic directly without requiring a VirtualRouter appliance.

Just to provide one less fluffy definition of what is the problem we are trying 
to solve...

> 
> Now that's not to say that a network can't provide L2 and L3
> information.  You should be able to create a network in CloudStack and
> based on the configuration you know that it is a single L2 or L3.  It
> is just that the core orchestration system can't make that fundamental
> assumption.  I'd be interested in furthering the model and maybe
> adding a concept of a L2 network such that a network guru when
> designing a network, can define multiple l2networks and associate them
> with the generic network that was created.  That idea I'm still
> toiling with.

I'd encourage you to not thing about L2 networks. I've yet to see an 
application that is "cloud-ready" that needs anything but IP connectivity. For 
IP it doesn't matter what the underlying data layer looks like... emulating 
ethernet is a rat-hole. There is no point in doing so.

> 
> For example, when configuring DHCP on the systemvm.  DHCP is a L2
> based serviced.

DHCP is an IP service. Typically provided via a DHCP relay service in the 
aggregation switch. For instance in OpenContrail this is provided in the 
hypervisor switch (aka vrouter linux kernel module).

>  So to configure DHCP you really need to know for each
> nic, what is the L2 its attached to and what are the VMs associated
> with that L2.  Today, since there is no first class concept of a L2
> network, you have to look at the implied definition of L2.  For basic
> networks, the L2 is the Pod, so you need to list all VMs in that Pod.
> For guest/VPC networks, the L2 is the network object, so you need to
> list all VMs associated with the network.  It would be nice if when
> the guru designed the network, it also defined the l2networks, and
> then when a VM starts the guru the reserve() method could associate
> the l2network to the nic.  So the nic object would have a network_id
> and a l2_network_id.

With OpenContrail, DHCP is quite simple. The Nic uuid is known by the vrouter 
kernel module on the compute-node. When the DHCP request comes from the tap/vif 
interface the vrouter answers locally (it known the relationship between Nic, 
its properties and virtual-network). Please do not try to bring L2 into the 
picture. It would be very unhelpful to do so.

For most data-centers, the main networking objective is to get rid of L2 and 
its limitations. Ethernet is really complex. It has a nice zero config 
deployment for very simple networks but at the cost of high complexity if you 
are trying to do redundancy, use multiple links, interoperate with other 
network devices, scale.... not to mention that all state is data-driven which 
makes it really really hard to debug. Ethernet as a layer 1 point to point link 
is great; not as a network.

  Pedro.

Re: [MERGE] network-guru-orchestration into master

Reply via email to