Add the document describing a new design for the OS installation process for new instances.
Signed-off-by: Michele Tartara <[email protected]> --- doc/design-draft.rst | 1 + doc/design-os.rst | 318 ++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 319 insertions(+) create mode 100644 doc/design-os.rst diff --git a/doc/design-draft.rst b/doc/design-draft.rst index c821292..3ed3852 100644 --- a/doc/design-draft.rst +++ b/doc/design-draft.rst @@ -20,6 +20,7 @@ Design document drafts design-daemons.rst design-hsqueeze.rst design-ssh-ports.rst + design-os.rst .. vim: set textwidth=72 : .. Local Variables: diff --git a/doc/design-os.rst b/doc/design-os.rst new file mode 100644 index 0000000..7a42a7f --- /dev/null +++ b/doc/design-os.rst @@ -0,0 +1,318 @@ +=============================== +Ganeti OS installation redesign +=============================== + +.. contents:: :depth: 3 + +This is a design document detailing a new OS installation procedure, more +secure, able to provide more features and easier to use for many common tasks +w.r.t. the current one. + +Current state and shortcomings +============================== + +As of Ganeti 2.10, each instance is associated with an OS definition. An OS +definition is a set of scripts (``create``, ``export``, ``import``, ``rename``) +that are executed with root privileges on the primary host of the instance to +perform all the OS-related functionality (setting up an operating system inside +the disks of the instance being created, exporting/importing the instance, +renaming it). + +These scripts receive, as environment variables, a fixed set of parameters +describing the instance (such as the hypervisor, the name of the instance, the +number of disks, and their location) and a set of user defined parameters. Each +of these parameters is also written into the configuration file of Ganeti, to +allow for future reinstalls of the instance, and in various log files, namely: + +* node daemon log file: contains DEBUG strings of the ``/os_validate``, + ``/instance_os_add`` and ``/instance_start`` RPC calls. + +* master daemon log file: DEBUG strings related to the same RPC calls are stored + here as well. + +* commands log: the CLI commands that create a new instance, including their + parameters, are logged here. + +* RAPI log: the RAPI commands that create a new instances, including their + parameters, are logged here. + +* job logs: the job files stored in the job queue or in its archive contain the + parameters. + +The current situation presents a number of shortcomings: + +* Having the installation scripts run with root power on the nodes is a huge + security issue. + +* Ganeti cannot be used to create instances starting from user provided disk + images: even in the (hypothetical) case where the scripts are completely + secure and run not by root but by an unprivileged user with only the power to + mount arbitrary files as disk images, this is a security issue. It has been + proven that a carefully crafted file system might exploit kernel + vulnerabilities to gain control of the system. Therefore, directly mounting + images on the Ganeti nodes is not an option. + +* There is no way to inject files into an existing disk image. A common use case + is for the system administrator to provide a standard image of the system, to + be later personalized with the network configuration, private keys identifying + the machine, ssh keys of the users and so on. A possible workaround would be + for the scripts to mount the image (only if this is trusted!) and to receive + the configurations and ssh keys as user defined OS parameters. Unfortunately, + this is also not an option for security sensitive material (such as the ssh + keys) because the OS parameters are stored in many places on the system, as + already described above. + +* Most other virtualization software simply work with instance images, not with + installation scripts. This difference makes the interaction of Ganeti with + other softwares difficult. + +Proposed changes +================ + +In order to fix the shortcomings of the current state, we plan to introduce the +following changes: + +* Change the OS parameters to have three categories: + + * ``public``: the current behavior. The parameter is logged and stored freely. + + * ``private``: the parameter is saved inside the Ganeti configuration (to allow + for instance reinstall) but it is not shown in logs, job logs, or passed back + via RAPI. + + * ``secret``: the parameter is not saved inside the Ganeti configuration. + Reinstall are impossible unless the data is passed again. The parameter will + not appear in any log file. In order to preserve the functionality of Ganeti, + the parameters will still need to be stored in the job files, but they will + be removed from there when the job has finished running (either successfully + or not). + +* A new OS installation procedure, based on a safe virtualized environment. + This virtualized environment will run with the same hardware parameter as the + actual instance being installed, as much as possible. This will also allow to + reduce the memory usage in the host (specifically, in Dom0 for Xen + installations). Each instance will have these possible execution modes: + + * ``run``: the default mode, used when the machine is running normally. + + * ``self_install``: Ganeti will start the instance with a different set of + user-specified parameters, therefore allowing to attach an installation + floppy/cdrom/network, change the boot device order, or specify an OS image + to be used. The instance will then be responsible to get the parameters for + configuring itself (its network interfaces, IP address, hostname, etc.) from + a set of metadata provided to it by Ganeti (e.g.: using an approach + comparable to the one of the ``cloud-init`` tool). When this installation + mode is used, no OS installation script is required. + In order for installation of an OS from an image to be possible, a new + parameter ``--os-image`` will be added, allwoing to specify where to take + the image from. It will have to be mutually exclusive with ``--os-type``. If + ``--os-image`` is specified, ``--os-parameters`` can still be used, as it + will be passed to the instance as part of the metadata. + The set of ``self_install`` parameters will be stored as part of the + instance configuration, so that they can be used to reinstall the instance. + It will be the user's responsibility to ensure that the OS image or any + installation media is still available in the proper position when a + reinstall happens. + + * ``install``: Ganeti will start the instance using a virtual appliance + specifically made for installing Ganeti instances. Scripts analogous to the + current ones will run inside this instance. The disks of the instance being + installed will be connected to this virtual appliance, so that the scripts + can mount them and modify them as needed, as currently happens, but with the + additional protection given by this happening in a VM. The virtual appliance + will be started in a clean state every time a new instance need to be + created, to further increase security. Metadata will be provided also to + this virtual applicance, that will take care of converting them to + environment variables for the installation scripts. + +In order to allow for the metadata to be sent inside the instance, a +communication mechanism between the instance and the host will be created. This +mechanism will be bidirectional (e.g.: to allow the setup process going on +inside the instance to communicate its progress to the host). Each instance will +have access exclusively to its own metadata, and it will be only able to +communicate with its host over this channel. + +As part of the instance creation command it will be possible to indicate a URL +for a "personalization package", that is an archive containing a set of files +meant to be overlayed on top of the operating system file system at the end of +the setup process, before the VM is started for the first time in ``run`` mode. +Ganeti will provide a mechanism for receiving and unpacking this archive as part +of the ``install`` execution mode, whereas in ``self_install`` mode it will only +be provided as a metadata for the instance to use. +The archive will be in TAR-GZIP format (with extension ``.tar.gz`` or ``.tgz``) +and will contain the files according to the directory structure that will be +recreated on the installation disk. Files contained in this archive will +overwrite files with the same path created during the install procedure (if +any). +The URL of the "personalization package" will have to specify an extesion to +identify the file format (in order to allow for more formats to be supported in +the future). +The URL will be stored as part of the configuration of the instance (therefore, +the URL should not contain confidential information, but the file there +available can). It is up to the system administrator to ensure that a package +is actually available at that URL at install and reinstall time. +The content of the package is allowed to change. E.g.: a system administrator +might create a package containing the private keys of the instance being +created. When the instance is reinstalled, a new package with new keys can be +made available there, therefore allowing instance reinstall without the need to +store keys. + +Implementation +============== + +The implementation of this design will happen as an ordered sequence of steps, +of increasing impact on the system and, in some cases, dependent on each other: + +#. Private and secret instance parameters +#. Communication mechanism between host and instance +#. Metadata service +#. Personalization package +#. ``self_install`` mode +#. ``install`` mode (with virtualization environment) + +Some of these steps need to be more deeply specified w.r.t. what is already +written in the `Proposed changes`_ Section. Extra details will be provided in +the following Subsections. + +Communication mechanism and metadata service +++++++++++++++++++++++++++++++++++++++++++++ + +The communication mechanism and the metadata service are described together +because they are deeply tied. On the other hand, the communication mechanism +will need to be more generic because it can be used for other reasons in the +future (like allowing instances to esplicitly send commands to Ganeti, or to let +Ganeti control a helper instance, like the one hereby introduced for performing +OS installs inside a safe environment). + +The communication mechanism will be enabled automatically when the instance is +in ``self_install`` or ``install`` mode, but for backwards compatibility it will +be disabled when the instance is in ``run`` mode unless it is esplicitly +requested at instance startup by using a new, ad-hoc, parameter +(``--communication``). + +When the communication mechanism is enabled, Ganeti will create a new network +interface inside the instance. This extra network interface will be the last one +of the instance, after all the user defined ones. On the host side, this +interface will be only accessible to the host itself, and not be routed outside +the machine. +On this network interface, the instance will connect using the IP: +169.254.169.1 and netmask 255.255.255.0. +The host will be on the same network, with the IP address: 169.254.169.254. +The instance will be able to connect to 169.254.169.254:80, and issue GET +requests to an HTTP server that will provide the instance metadata. + +The choice of this IP address and port is done for compatibility reasons with +OpenStack's and Amazon EC2's ways of providing metadata to the instance. + +Where possible, the metadata will be provided in a way compatible with OpenStack +at:: + + http://169.254.169.254/openstack/<version>/meta_data.json + +or with Amazon EC2, at:: + + http://169.254.169.254/<version>/meta-data/* + +If some metadata are Ganeti-specific and don't fit this structure, they will be +provided at:: + + http://169.254.169.254/<version>/ganeti/meta_data.json + +``<version>`` is either a date in YYYY-MM-DD format, or ``latest`` to indicate +the most recent available protocol version. + +A bi-directional, pipe-like communication channel will be provided. The instance +will be able to receive data from the host by a GET request at:: + + http://169.254.169.254/<version>/ganeti/pipe_in + +and to send data to the host by a POST request at:: + + http://169.254.169.254/<version>/ganeti/pipe_out + +As in a pipe, once the data are read, they will not be in the buffer anymore, so +subsequent get request to ``pipe_in`` will not return the same data twice. +Unlike a pipe, though, it will not be possible to perform blocking I/O +operations. + +The OS parameters will be accessible through a GET +request at:: + + http://169.254.169.254/<version>/ganeti/os/parameters/<visibility>.json + +as a JSON serialized dictionary. ``<visibility>`` will be either ``public`` or +``private`` or ``secret``. + +The installation scripts to be run inside the virtualized environment while the +instance is run in ``install`` mode will be available at:: + + http://169.254.169.254/<version>/ganeti/os/scripts/<script_name> + +where ``<script_name>`` is the name of the script. + +The host and the instances (as detailed in `Installation process in a +virtualized environment`_) will be able to create other communication channels +on the other ports of the same IP address. + + +Rationale +--------- + +The choice of using a network interface for instance-host communication, as +opposed to VirtIO, XenBus or other methods, is due to the will of having a +generic, hypervisor-independent way of creating a communication channel, that +doesn't require unusual (para)virtualization drivers. +At the same time, a network interface was preferred over solutions involving +virtual floppy or USB devices because the latter tend to be detected and +configured by the guest operating systems, sometimes even in prominent positions +in the user interface, whereas it is fairly common to have an unconfigured +network interface in a system, usually without any negative side effects. + + +Installation process in a virtualized environment ++++++++++++++++++++++++++++++++++++++++++++++++++ + +In the new OS installation scenario, we distinguish between trusted and +untrusted code. + +The trusted installation code maintains the behavior of the current one, with +the scripts running on the node the instance is being created on. The untrusted +code is stored in a subdirectory of the OS definition called ``untrusted``. +This directory contains scripts that are equivalent to the already existing +ones (``create``, ``export``, ``import``, ``rename``) but that will be run +inside an virtualized environment, to protect the host from malicious tampering. + +The ``untrusted`` code is meant to either be untrusted itself, or to be trusted +code running operations that might be dangerous (such as mounting a +user-provided image). + +In order to allow for the highest flexibility, if both a trusted and an +untrusted script are provided for the same operation (i.e. ``create``), both of +them will be executed at the same time, one on the host, and one inside the +installation appliance. They will be allowed to communicate with each other +through the already described communication mechanism, in order to orchestrate +their execution (e.g.: the untrusted code might execute the installation, while +the trusted one receives status updates from it and delivers them to a user +interface). + +Ganeti will provide a script to be run at install time that can be used to +create the virtualized environment that will perform the OS installation of new +instances. +This script will build a debootstrapped basic debian system including including +a software that will read the metadata, setup the environment variables and +launch the installation scripts inside the virtualized environment. The script +will also provide hooks for personalization. + +It will also be possible to use other self-made virtualized environment, as long +as they connect to ganeti over the described communication mechanism and they +know how to read and use the provided metadata to create a new instance. + +While performing an installation in the virtualized environment, a +personalizable timeout will be used to detect possible problems with the +installation process, and to kill the virtualized environment. + +.. vim: set textwidth=72 : +.. Local Variables: +.. mode: rst +.. fill-column: 72 +.. End: -- 1.7.10.4
