This is the design behind the first hotplug implementation for the KVM hypervisor.
Signed-off-by: Dimitris Aragiorgis <[email protected]> --- Hello team, This is the updated design doc for hotplug. It includes all modifications/ suggestions that have been discussed in the last thread. I will wait for your comments or eventually the final ACK so that I can proceed with implementation patches. Thanks a lot, dimara Makefile.am | 1 + doc/design-draft.rst | 1 + doc/design-hotplug.rst | 222 ++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 224 insertions(+) create mode 100644 doc/design-hotplug.rst diff --git a/Makefile.am b/Makefile.am index 91f3f37..fda6f58 100644 --- a/Makefile.am +++ b/Makefile.am @@ -435,6 +435,7 @@ docinput = \ doc/design-cpu-pinning.rst \ doc/design-device-uuid-name.rst \ doc/design-draft.rst \ + doc/design-hotplug.rst \ doc/design-htools-2.3.rst \ doc/design-http-server.rst \ doc/design-impexp2.rst \ diff --git a/doc/design-draft.rst b/doc/design-draft.rst index 0e454cd..4c1c692 100644 --- a/doc/design-draft.rst +++ b/doc/design-draft.rst @@ -20,6 +20,7 @@ Design document drafts design-internal-shutdown.rst design-glusterfs-ganeti-support.rst design-openvswitch.rst + design-hotplug.rst .. vim: set textwidth=72 : .. Local Variables: diff --git a/doc/design-hotplug.rst b/doc/design-hotplug.rst new file mode 100644 index 0000000..ff4ff95 --- /dev/null +++ b/doc/design-hotplug.rst @@ -0,0 +1,222 @@ +======= +Hotplug +======= + +.. contents:: :depth: 4 + +This is a design document detailing the implementation of device +hotplugging in Ganeti. The logic used is hypervisor agnostic but still +the initial implementation will target the KVM hypervisor. The +implementation adds ``python-fdsend`` as a new dependency. + + +Current state and shortcomings +============================== + +Currently, Ganeti supports addition/removal/modification of devices +(NICs, Disks) but the actual modification takes place only after +rebooting the instance. To this end an instance cannot change network, +get a new disk etc. without a hard reboot. + +Until now, in case of KVM hypervisor, code does not name devices nor +places them in specific PCI slots. Devices are appended in the KVM +command and Ganeti lets KVM decide where to place them. This means that +there is a possibility a device that resides in PCI slot 5, after a +reboot (due to another device removal) to be moved to another PCI slot +and probably get renamed too (due to udev rules, etc.). + +In order migration to succeed, the process on the target node should be +started with exactly the same machine version, CPU architecture and PCI +configuration with the running process. During instance creation/startup +ganeti creates a KVM runtime file with all the necessary information to +generate the KVM command. This runtime file is used during instance +migration to start a new identical KVM process. The current format +includes the fixed part of the final KVM command, a list of NICs', +and hvparams dict. It does not favor easy manipulations concerning +disks, because they are encapsulated in the fixed KVM command. + +Proposed changes +================ + +For the case of the KVM hypervisor, QEMU exposes 32 PCI slots to the +instance. Disks and NICs occupy some of these slots. Recent versions of +QEMU have introduced monitor commands that allow addition/removal of PCI +devices. Devices are referenced based on their name or position on the +virtual PCI bus. To be able to use these commands, we need to be able to +assign each device a unique name. + +To keep track where each device is plugged into, we add the +``pci`` slot to Disk and NIC objects, but we save it only in runtime +files, since it is hypervisor specific info. This is added for easy +object manipulation and is ensured not to be written back to the config. + +We propose to make use of QEMU 1.0 monitor commands so that +modifications to devices take effect instantly without the need for hard +reboot. The only change exposed to the end-user will be the addition of +a ``--hotplug`` option to the ``gnt-instance modify`` command. + +Upon hotplugging the PCI configuration of an instance is changed. +Runtime files should be updated correspondingly. Currently this is +impossible in case of disk hotplug because disks are included in command +line entry of the runtime file, contrary to NICs that are correctly +treated separately. We change the format of runtime files, we remove +disks from the fixed KVM command and create new entry containing them +only. KVM options concerning disk are generated during +``_ExecuteKVMCommand()``, just like NICs. + +Design decisions +================ + +Which should be each device ID? Currently KVM does not support arbitrary +IDs for devices; supported are only names starting with a letter, max 32 +chars length, and only including '.' '_' '-' special chars. +We use the device pci slot and name it after <device type>-pci-<slot> +(for debugging purposes we could add a part of uuid as well). + +Who decides where to hotplug each device? As long as this is a +hypervisor specific matter, there is no point for the master node to +decide such a thing. Master node just has to request noded to hotplug a +device. To this end, hypervisor specific code should parse the current +PCI configuration (i.e. ``info pci`` QEMU monitor command), find the first +available slot and hotplug the device. Having noded to decide where to +hotplug a device we ensure that no error will occur due to duplicate +slot assignment (if masterd keeps track of PCI reservations and noded +fails to return the PCI slot that the device was plugged into then next +hotplug will fail). + +Where should we keep track of devices' PCI slots? As already mentioned, +we must keep track of devices PCI slots to successfully migrate +instances. First option is to save this info to config data, which would +allow us to place each device at the same PCI slot after reboot. This +would require to make the hypervisor return the PCI slot chosen for each +device, and storing this information to config data. Additionally the +whole instance configuration should be returned with PCI slots filled +after instance start and each instance should keep track of current PCI +reservations. We decide not to go towards this direction in order to +keep it simple and do not add hypervisor specific info to configuration +data (``pci_reservations`` at instance level and ``pci`` at device +level). For the aforementioned reason, we decide to store this info only +in KVM runtime files. + +Where to place the devices upon instance startup? QEMU has by default 4 +pre-occupied PCI slots. So, hypervisor can use the remaining ones for +disks and NICs. Currently, PCI configuration is not preserved after +reboot. Each time an instance starts, KVM assigns PCI slots to devices +based on their ordering in Ganeti configuration, i.e. the second disk +will be placed after the first, the third NIC after the second, etc. +Since we decided that there is no need to keep track of devices PCI +slots, there is no need to change current functionality. + +How to deal with existing instances? Hotplug depends on runtime file +manipulation. It stores there pci info and every device the kvm process is +currently using. Existing files have no pci info in devices and have block +devices encapsulated inside kvm_cmd entry. Thus hotplugging of existing devices +will not be possible. Still migration and hotplugging of new devices will +succeed. The workaround will happen upon loading kvm runtime: if we detect old +style format we will add an empty list for block devices and upon saving kvm +runtime we will include this empty list as well. Switching entirely to new +format will happen upon instance reboot. + + +Configuration changes +--------------------- + +The ``NIC`` and ``Disk`` objects get one extra slot: ``pci``. It refers to +PCI slot that the device gets plugged into. + +In order to be able to live migrate successfully, runtime files should +be updated every time a live modification (hotplug) takes place. To this +end we change the format of runtime files. The KVM options referring to +instance's disks are no longer recorded as part of the KVM command line. +Disks are treated separately, just as we treat NICs right now. We insert +and remove entries to reflect the current PCI configuration. + + +Backend changes +--------------- + +Introduce one new RPC call: + +- hotplug_device(DEVICE_TYPE, ACTION, device, ...) + +where DEVICE_TYPE can be either NIC or Disk, and ACTION either REMOVE or ADD. + +Hypervisor changes +------------------ + +We implement hotplug on top of the KVM hypervisor. We take advantage of +QEMU 1.0 monitor commands (``device_add``, ``device_del``, +``drive_add``, ``drive_del``, ``netdev_add``,`` netdev_del``). QEMU +refers to devices based on their id. We use ``uuid`` to name them +properly. If a device is about to be hotplugged we parse the output of +``info pci`` and find the occupied PCI slots. We choose the first +available and the whole device object is appended to the corresponding +entry in the runtime file. + +Concerning NIC handling, we build on the top of the existing logic +(first create a tap with _OpenTap() and then pass its file descriptor to +the KVM process). To this end we need to pass access rights to the +corresponding file descriptor over the monitor socket (UNIX domain +socket). The open file is passed as a socket-level control message +(SCM), using the ``fdsend`` python library. + + +User interface +-------------- + +The new ``--hotplug`` option to gnt-instance modify is introduced, which +forces live modifications. + + +Enabling hotplug +++++++++++++++++ + +Hotplug will be optional during gnt-instance modify. For existing +instance, after installing a version that supports hotplugging we +have the restriction that hotplug will not be supported for existing +devices. The reason is that old runtime files lack of: + +1. Device pci configuration info. + +2. Separate block device entry. + +Hotplug will be supported only for KVM in the first implementation. For +all other hypervisors, backend will raise an Exception case hotplug is +requested. + + +NIC hotplug ++++++++++++ + +The user can add/modify/remove NICs either with hotplugging or not. If a +NIC is to be added a tap is created first and configured properly with +kvm-vif-bridge script. Then the instance gets a new network interface. +Since there is no QEMU monitor command to modify a NIC, we modify a NIC +by temporary removing the existing one and adding a new with the new +configuration. When removing a NIC the corresponding tap gets removed as +well. + +:: + + gnt-instance modify --net add --hotplug test + gnt-instance modify --net 1:mac=aa:00:00:55:44:33 --hotplug test + gnt-instance modify --net 1:remove --hotplug test + + +Disk hotplug +++++++++++++ + +The user can add and remove disks with hotplugging or not. QEMU monitor +supports resizing of disks, however the initial implementation will +support only disk addition/deletion. + +:: + + gnt-instance modify --disk add:size=1G --hotplug test + gnt-instance modify --net 1:remove --hotplug test + +.. vim: set textwidth=72 : +.. Local Variables: +.. mode: rst +.. fill-column: 72 +.. End: -- 1.7.10.4
signature.asc
Description: Digital signature
