This patch adds a design document detailing the implementation of MacVTap support in Ganeti.
Signed-off-by: Dimitris Bliablias <[email protected]> --- Hello, After our discussions during the latest GanetiCon, I'm sending the revised design document of providing support for the MacVTap driver in Ganeti. Looking forward to your feedback, Dimitris Makefile.am | 1 + doc/design-draft.rst | 1 + doc/design-macvtap.rst | 266 +++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 268 insertions(+) create mode 100644 doc/design-macvtap.rst diff --git a/Makefile.am b/Makefile.am index a506296..2f8cd8e 100644 --- a/Makefile.am +++ b/Makefile.am @@ -696,6 +696,7 @@ docinput = \ doc/design-location.rst \ doc/design-linuxha.rst \ doc/design-lu-generated-jobs.rst \ + doc/design-macvtap.rst \ doc/design-memory-over-commitment.rst \ doc/design-migration-speed-hbal.rst \ doc/design-monitoring-agent.rst \ diff --git a/doc/design-draft.rst b/doc/design-draft.rst index 353f0cd..247eeca 100644 --- a/doc/design-draft.rst +++ b/doc/design-draft.rst @@ -27,6 +27,7 @@ Design document drafts design-repaird.rst design-migration-speed-hbal.rst design-memory-over-commitment.rst + design-macvtap.rst .. vim: set textwidth=72 : .. Local Variables: diff --git a/doc/design-macvtap.rst b/doc/design-macvtap.rst new file mode 100644 index 0000000..1440ab9 --- /dev/null +++ b/doc/design-macvtap.rst @@ -0,0 +1,266 @@ +=============== +MacVTap support +=============== + +.. contents:: :depth: 3 + +This is a design document detailing the implementation of `MacVTap` +support in Ganeti. The initial implementation targets the KVM +hypervisor, but it is intended to be ported to the XEN hypervisor as +well. + +Current state and shortcomings +============================== + +Currently, Ganeti provides a number of options for networking a virtual +machine, that are the ``bridged``, ``routed``, and ``openvswitch`` +modes. ``MacVTap``, is another virtual network interface in Linux, that +is not supported by Ganeti and that could be added to the currently +supported solutions. It is an interface that acts as a regular TUN/TAP +device, and thus it is transparently supported by QEMU. Because of its +design, it can greatly simplify Ganeti setups using bridged instances. + +In brief, the MacVTap interface is based on the ``MacVLan`` Linux +driver, which basically allows a single physical interface to be +associated with multiple IPs and MAC addresses. It is meant to replace +the combination of the TUN/TAP and bridge drivers with a more +lightweight setup that doesn't require any extra configuration on the +host. MacVTap driver is supposed to be more efficient than using a +regular bridge. Unlike bridges, it doesn't need to do STP or to +discover/learn MAC addresses of other connected devices on a given +domain, as it it knows every MAC address it can receive. In fact, it +introduces a bridge-like behavior for virtual machines but without the +need to have a real bridge setup on the host. Instead, each virtual +interface extends an existing network device by attaching directly to +it, having its own MAC address, and providing a separate virtual +interface to be used by the userspace processes. The MacVTap MAC address +is used on the external network and the guest OS cannot spoof or change +that address. + +Background +========== + +This section provides some extra information for the MacVTap interface, +that we took into account for the rest of this design document. + +MacVTap modes of operation +-------------------------- + +A MacVTap device can operate in one of four modes, just like the MacVLan +driver does. These modes determine how the tap endpoints communicate +between each other providing various levels of isolation between them. +Those modes are the following: + +* `VEPA (Virtual Ethernet Port Aggregator) mode`: The default mode that + is compatible with virtualization-enabled switches. The communication + between endpoints on the same lower device, happens through the + external switch. + +* `Bridge mode`: It works almost like a traditional bridge, connecting + all endpoints directly to each other. + +* `Private mode`: An endpoint in this mode can never communicate to any + other endpoint on the same lower device. + +* `Passthru mode`: This mode was added later to work on some limitations + on MacVLans (more details here_). + +MacVTap internals +----------------- + +The creation of a MacVTap device is *not* done by opening the +`/dev/net/tun` device and issuing a corresponding `ioctl()` to register +a network device as happens in tap devices. Instead, there are two ways +to create a MacVTap device. The first one is using the `rtnetlink(7)` +interface directly, just like the `libvirt` or the `iproute2` utilities +do, and the second one is to use the high-level `ip-link` command. Since +creating a MacVTap interface programmatically using the netlink protocol +is a bit more complicated than creating a normal TUN/TAP device, we +propose using the ip-link tool for the MacVTap handling, which it is +much simpler and straightforward in use, and also fulfills all our +needs. Additionally, since Ganeti already depends on `iproute2` being +installed in the system, this does not introduces an extra dependency. + +The following example, creates a MacVTap device using the `ip-link` +tool, named `macvtap0`, operating in `bridge` mode, and which is using +`eth0` as its lower device: + +:: + + ip link add link eth0 name macvtap0 address 1a:36:1b:aa:b3:77 type macvtap mode bridge + +Once a MacVTap interface is created, an actual character device appears +under `/dev`, called ``/dev/tapXX``, where ``XX`` is the interface index +of the device. + +Proposed changes +================ + +In order to be able to create instances using the MacVTap device driver, +we propose some modifications that affect the ``nicparams`` slot of the +Ganeti's configuration ``NIC`` object, and also the code part regarding +to the KVM hypervisor, as detailed in the following sections. + +Configuration changes +--------------------- + +The nicparams ``mode`` attribute will be extended to support the +``macvtap`` mode. When using the MacVTap mode, the ``link`` attribute +will specify the network device where the MacVTap interfaces will be +attached to, the *lower device*. Note that the lower device should +exists, otherwise the operation will fail. If no link is specified, the +cluster-wide default NIC `link` param will be used instead. + +We propose the MacVTap mode to be configurable, and so the nicparams +object will be extended with an extra slot named ``mvtap_mode``. This +parameter will only be used if the network mode is set to MacVTap since +it does not make sense in other modes, similarly to the `vlan` slot of +the `openvswitch` mode. + +Below there is a snippet of some of the ``gnt-network`` commands' +output: + +Network connection +~~~~~~~~~~~~~~~~~~ + +:: + + gnt-network connect -N mode=macvtap,link=eth0,mvtap_mode=bridge vtap-net vtap_group + +Network listing +~~~~~~~~~~~~~~~ + +:: + + gnt-network list + + Network Subnet Gateway MacPrefix GroupList + br-net 10.48.1.0/24 10.48.1.254 - default (bridged, br0, , ) + vtap-net 192.168.100.0/24 192.168.100.1 - vtap_group (macvtap, eth0, , bridge) + +Network information +~~~~~~~~~~~~~~~~~~~ + +:: + + gnt-network info + + Network name: vtap-net + UUID: 4f139b48-3f08-46b1-911f-d37de7e12dcf + Serial number: 1 + Subnet: 192.168.100.0/28 + Gateway: 192.168.100.1 + IPv6 Subnet: 2001:db8:2ffc::/64 + IPv6 Gateway: 2001:db8:2ffc::1 + Mac Prefix: None + size: 16 + free: 10 (62.50%) + usage map: + 0 XXXXX..........X 63 + (X) used (.) free + externally reserved IPs: + 192.168.100.0, 192.168.100.1, 192.168.100.15 + connected to node groups: + vtap_group (mode:macvtap link:eth0 vlan: mvtap_mode:bridge) + used by 2 instances: + inst1.example.com: 0:192.168.100.2 + inst2.example.com: 0:192.168.100.3 + + +Hypervisor changes +------------------ + +A new method will be introduced in the KVM's `netdev.py` module, named +``OpenVTap``, similar to the ``OpenTap`` method, that will be +responsible for creating a MacVTap device using the `ip-link` command, +and returning its file descriptor. The ``OpenVtap`` method will receive +as arguments the network's `link`, the mode of the MacVTap device +(``mvtap_mode``), and also the ``interface name`` of the device to be +created, otherwise we will not be able to retrieve it, and so opening +the created device. + +Since we want the names among the MacVTap devices to be unique on the +same node, we will make use of the existing ``_GenerateKvmTapName`` +method to generate device names but with some modifications, to be +adapted to our needs. This method is actually a wrapper over the +``GenerateTapName`` method which currently is being used to generate TAP +interface names for NICs meant to be used in instance communication +using the ``gnt.com`` prefix. We propose extending this method to +generate names for the MacVTap interface too, using the ``vtap`` prefix. +To do so, we could add an extra boolean argument in that method, named +`inst_comm`, to differentiate the two cases, so that the method will +return the appropriate name depending on its usage. This argument will +be optional and defaulted to `True`, to not affect the existing API. + +Currently, the `OpenTap` method handles the `vhost-net`, `mq`, and the +`vnet_hdr` features. The `vhost-net` feature will be normally supported +for the MacVTap devices too, and so is the `multiqueue` feature, which +can be enabled using the `numrxqueues` and `numtxqueues` parameters of +the `ip-link` command. The only drawback seems to be the `vnet_hdr` +feature modification. For a MacVTap device this flag is enabled by +default, and it can not be disabled if a user requests to. + +A last hypervisor change will be the introduction of a new method named +``_RemoveStaleMacvtapDevs`` that will remove any remaining MacVTap +devices, and which is detailed in the following section. + +Tools changes +------------- + +Some of the Ganeti tools should also be extended to support MacVTap +devices. Those are the ``kvm-ifup`` and ``net-common`` scripts. These +modifications will include a new method named ``setup_macvtap`` that +will simply change the device status to `UP` just before and instance is +started: + +:: + + ip link set $INTERFACE up + +As mentioned in the `Background` section, MacVTap devices are +persistent. So, we have to manually delete the MacVTap device after an +instance shutdown. To do so, we propose creating a ``kvm-ifdown`` +script, that will be invoked after an instance shutdown in order to +remove the relevant MacVTap devices. The ``kvm-ifdown`` script should +explicitly call the following commands and currently will be functional +for MacVTap NICs only: + +:: + + ip link set $INTERFACE down + ip link delete $INTERFACE + +To be able to call the `kvm-ifdown` script we should extend the KVM's +``_ConfigureNIC`` method with an extra argument that is the name of the +script to be invoked, instead of calling by default the `kvm-ifup` +script, as it currently happens. + +The invocation of the `kvm-ifdown` script will be made through a +separate method that we will create, named ``_RemoveStaleMacvtapDevs``. +This method will read the NIC runtime files of an instance and will +remove any devices using the MacVTap interface. This method will be +included in the ``CleanupInstance`` method in order to cover all the +cases where an instance using MacVTap NICs needs to be cleaned up. + +Besides the instance shutdown, there are a couple of cases where the +MacVTap NICs will need to be cleaned up too. In case of an internal +instance shutdown, where the ``kvmd`` is not enabled, the instance will +be in ``ERROR_DOWN`` state. In that case, when the instance is started +either by the `ganeti-watcher` or by the admin, the ``CleanupInstance`` +method, and consequently the `kvm-ifdown` script, will not be called and +so the MacVTap NICs will have to manually be deleted. Otherwise starting +the instance will result in more than one MacVTap devices using the same +MAC address. An instance migration is another case where deleting an +instance will keep stale MacVTap devices on the source node. In order +to solve those potential issues, we will explicitly call the +``_RemoveStaleMacvtapDevs`` method after a successful instance migration +on the source node, and also before creating a new device for a NIC that +is using the MacVTap interface to remove any stale devices. + +.. _here: http://thread.gmane.org/gmane.comp.emulators.kvm.devel/61824/) + +.. vim: set textwidth=72 : +.. Local Variables: +.. mode: rst +.. fill-column: 72 +.. End: -- 2.1.4
