This patch adds a design document detailing the implementation providing support for the MacVTap device driver in Ganeti.
Signed-off-by: Dimitris Bliablias <[email protected]> --- Hello, This design document describes the implementation for providing support for the MacVTap device driver in Ganeti. An interface that could greatly simplify Ganeti setups using bridged instances. Looking forward for your feedback, Dimitris Makefile.am | 1 + doc/design-draft.rst | 1 + doc/design-macvtap.rst | 262 +++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 264 insertions(+) create mode 100644 doc/design-macvtap.rst diff --git a/Makefile.am b/Makefile.am index 5068050..b65706d 100644 --- a/Makefile.am +++ b/Makefile.am @@ -685,6 +685,7 @@ docinput = \ doc/design-location.rst \ doc/design-linuxha.rst \ doc/design-lu-generated-jobs.rst \ + doc/design-macvtap.rst \ doc/design-monitoring-agent.rst \ doc/design-move-instance-improvements.rst \ doc/design-multi-reloc.rst \ diff --git a/doc/design-draft.rst b/doc/design-draft.rst index c589b56..ac107ad 100644 --- a/doc/design-draft.rst +++ b/doc/design-draft.rst @@ -28,6 +28,7 @@ Design document drafts design-dedicated-allocation.rst design-allocation-efficiency.rst design-shared-storage-redundancy.rst + design-macvtap.rst .. vim: set textwidth=72 : .. Local Variables: diff --git a/doc/design-macvtap.rst b/doc/design-macvtap.rst new file mode 100644 index 0000000..e6d9239 --- /dev/null +++ b/doc/design-macvtap.rst @@ -0,0 +1,262 @@ +=============== +MacVTap support +=============== + +.. contents:: :depth: 3 + +This is a design document detailing the implementation providing +support for the `MacVTap` device driver in Ganeti. The initial +implementation will target the KVM hypervisor, but it is intended to be +ported to the XEN hypervisor as well. + +Current state and shortcomings +============================== + +Currently, Ganeti provides a number of options for networking a virtual +machine, i.e., ``bridged``, ``routed``, and ``openvswitch`` modes. +``MacVTap``, is another virtual network interface in Linux, that is not +supported by Ganeti and we could add it to the currently supported +solutions. It is an interface that acts as a regular TUN/TAP device, +and thus it is transparently supported by QEMU. Because of its +operation, it can greatly simplify Ganeti setups using bridged +instances. + +In brief, it is an interface based on the ``macvlan`` Linux driver, +meant to replace the combination of the TUN/TAP and bridge drivers with +a simplified setup that doesn't need to do learning or STP as it knows +every MAC address it can receive. In fact, it introduces a bridge-like +behavior of virtual machines but without the need to have a real bridge +setup on the host. Instead, each virtual interface extends an existing +network device by attaching directly to it, and has its own MAC address +providing a separate virtual interface to be used by the userspace +processes. The MacVTap MAC address is used on the external network and +the guest OS cannot spoof or change that address. + +Background +========== + +This section gives some extra information on the MacVTap interface, that +we took into account for the rest of this design document. + +MacVTap modes of operation +-------------------------- + +A MacVTap device can operate in one of four modes, like the macvlan +driver does, that are defined at creation time and determine how the +tap endpoints communicate between each other. Those are the following: + +* `VEPA (Virtual Ethernet Port Aggregator) mode`: The default mode that + is compatible with virtualization-enabled switches. The communication + between endpoints on the same lower device, happens through the + external switch. + +* `Bridge mode`: It works almost like a traditional bridge, connecting + all endpoints directly to each other. + +* `Private mode`: An endpoint in this mode can never communicate to any + other endpoint on the same lower device. + +* `Passthru mode`: This mode was added later to work on some + limitations on macvlans (more details here_). + +MacVTap internals +----------------- + +The creation of a MacVTap device is not done by opening the +`/dev/net/tun` device and issuing a corresponding `ioctl()` to register +a network device as happens in tap devices. Instead, there are two ways +to create a MacVTap device. The first one is using the `rtnetlink(7)` +interface directly, just like the `libvirt` or the `iproute2` utilities +do, and the second one is to use the high-level `ip-link` command. Since +creating programmatically a MacVTap interface using the netlink protocol +is a bit more complicated than creating a normal TUN/TAP device, we +propose using the ip-link tool for the MacVTap handling, which it is +more simple and straightforward in use, and also fulfills all our needs. +Thus, since Ganeti already depends on `iproute2` being installed in the +system, this does not introduces an extra dependency. + +The following example, creates a MacVTap device using the `ip-link` +tool, named `macvtap0`, operating in `bridge` mode, and which is using +`eth0` as its lower device: + +:: + + ip link add link eth0 name macvtap0 address 1a:36:1b:aa:b3:77 type macvtap mode bridge + +Once a MacVTap interface is created, an actual character device appears +under `/dev`, called ``/dev/tapXX``, where ``XX`` is the interface index +of the device. + +Proposed changes +================ + +In order to be able to create instances using the MacVTap device driver, +we propose some modifications that affect the ``nicparams`` slot of the +Ganeti's configuration ``NIC`` object, and also the code part regarding +to the KVM hypervisor, as detailed in the following sections. + +Configuration changes +--------------------- + +The nicparams ``mode`` attribute will be extended to support the +``macvtap`` mode. When using the MacVTap mode, the ``link`` attribute +will specify the network device where the MacVTap interfaces will be +attached to (the lower device). Note that the lower device should +exist, otherwise the operation will fail. If no link is specified, the +cluster-wide default NIC `link` param will be used instead. + +We propose the MacVTap mode to be configurable, and so the nicparams +object will be extended with an extra slot named ``mvtap_mode``. This +parameter will only be used if the network mode is set to MacVTap since +it does not make sense in other modes, similarly to the `vlan` slot of +the `openvswitch` mode. + +Below there is a snippet of some of the ``gnt-network`` commands' +output: + +Network connection +~~~~~~~~~~~~~~~~~~ + +:: + + gnt-network connect -N mode=macvtap,link=eth0,mvtap_mode=bridge vtap-net vtap_group + +Network listing +~~~~~~~~~~~~~~~ + +:: + + gnt-network list + + Network Subnet Gateway MacPrefix GroupList + br-net 10.48.1.0/2 4 10.48.1.254 - default (bridged, br0, , ) + vtap-net 192.168.100.0/24 192.168.100.1 - vtap_group (macvtap, eth0, , bridge) + +Network information +~~~~~~~~~~~~~~~~~~~ + +:: + + gnt-network info + + Network name: vtap-net + UUID: 4f139b48-3f08-46b1-911f-d37de7e12dcf + Serial number: 1 + Subnet: 192.168.100.0/28 + Gateway: 192.168.100.1 + IPv6 Subnet: 2001:db8:2ffc::/64 + IPv6 Gateway: 2001:db8:2ffc::1 + Mac Prefix: None + size: 16 + free: 10 (62.50%) + usage map: + 0 XXXXX..........X 63 + (X) used (.) free + externally reserved IPs: + 192.168.100.0, 192.168.100.1, 192.168.100.15 + connected to node groups: + vtap_group (mode:macvtap link:eth0 vlan: mvtap_mode:bridge) + used by 2 instances: + inst1.example.com: 0:192.168.100.2 + inst2.example.com: 0:192.168.100.3 + + +Hypervisor changes +------------------ + +A new method will be introduced in the KVM's `netdev.py` module, named +``OpenVTap``, similar to the ``OpenTap`` method, that will be +responsible for creating a MacVTap device using the `ip-link` command, +and returning its file descriptor. The ``OpenVtap`` method will receive +as arguments the network's `link`, the mode of the MacVTap device +(``mvtap_mode``), and also the ``interface name`` of the device to be +created, otherwise we will not be able to retrieve it, and so opening +the created device. + +Since we want the names among the MacVTap devices to be unique on the +same node, we will make use of the existing ``_GenerateKvmTapName`` +method to generate device names but with some modifications, to adapt it +to our needs. This method is actually a wrapper over the +``GenerateTapName`` method which currently is used to generate TAP +interface names for NICs meant to be used in instance communication +using the `gnt.com` prefix. We propose extending this method to generate +names for the MacVTap interface too, using the `vtap` prefix. To do so, +we could add an extra boolean argument in this method, named +`instance_comm`, to differentiate the two cases so the method returns +the appropriate name depending on its usage. This argument will be +optional and defaulted to `True`, to not affect the existing API. + +Currently, the `OpenTap` method handles the `vhost-net`, `mq`, and the +`vnet_hdr` features. The `vhost-net` feature will be normally supported +for the MacVTap devices too, and so is the `multiqueue` feature, that +can be enabled using the `numrxqueues` and `numtxqueues` parameters of +the `ip-link` command. The only drawback seems to be the `vnet_hdr` +feature modification. For a MacVTap device this flag is enabled by +default, and it can not be disabled if the user request to. + +A final hypervisor change will be the introduction of a new method named +``_RemoveStaleMacvtapDevices`` that will remove any remaining MacVTap +devices, and which is detailed in the following section. + +Tools changes +------------- + +Some of the Ganeti tools should also be extended to support MacVTap +devices. Those are the ``kvm-ifup`` and ``net-common`` scripts. Those +modifications will include a new method named ``setup_macvtap`` that +will simply change the device status to `UP` just before we start an +instance: + +:: + + ip link set $INTERFACE up + +As mentioned in the `Background` section, MacVTap devices are +persistent. So, we have to manually delete the MacVTap device after an +instance shutdown. To do so, we propose creating a ``kvm-ifdown`` +script, that will be invoked after an instance shutdown in order to +remove the remaining MacVTap devices. The ``kvm-ifdown`` should +explicitly call the following commands and will be functional for +MacVTap NICs only: + +:: + + ip link set $INTERFACE down + ip link delete $INTERFACE + +To be able to call the `kvm-ifdown` script we should extend the KVM's +``_ConfigureNIC`` method with an extra argument that is the name of the +script to be invoked, instead of calling by default the `kvm-ifup` +script, as it currently happens. + +The invocation of the `kvm-ifdown` script will be made through a +separate method we will create, named ``_RemoveStaleMacvtapDevices``. +This method will read the NIC runtime files of the instance and will +remove any devices using the MacVTap interface. This method will be +included in the ``CleanupInstance`` method in order to cover all the +cases where an instance using MacVTap NICs needs to be cleaned up. + +Besides the instance shutdown, there are a couple of cases where the +MacVTap NICs will need to be cleaned up too. In case of an internal +instance shutdown, where the ``kvmd`` is not enabled, the instance will +be in ``ERROR_DOWN`` state. In that case, when the instance is started +either by the `ganeti-watcher` or by the admin, the ``CleanupInstance`` +method, and consequently the `kvm-ifdown` script, will not be called +and so the MacVTap NICs will have to manually be deleted. Otherwise +starting the instance will result in more than one MacVTap devices using +the same MAC address. An instance migration is another case where +deleting an instance will keep stale MacVTap devices on the source node. +In order to solve those potential issues, we will explicitly call the +``_RemoveStaleMacvtapDevices`` method after a successful instance +migration on the source node, and also before creating a new device for +a NIC which is using the macvtap interface to remove any remaining +MacVTap devices. + + +.. _here: http://thread.gmane.org/gmane.comp.emulators.kvm.devel/61824/) + +.. vim: set textwidth=72 : +.. Local Variables: +.. mode: rst +.. fill-column: 72 +.. End: -- 2.1.4
