On Wed, Dec 11, 2013 at 4:10 PM, Jose A. Lopes <[email protected]> wrote: > On Mon, Dec 09, 2013 at 10:30:17AM +0100, Michele Tartara wrote: >> Add the document describing a new design for the OS installation process for >> new instances. >> >> Signed-off-by: Michele Tartara <[email protected]>
LGTM Thanks, Guido >> --- >> doc/design-draft.rst | 1 + >> doc/design-os.rst | 399 >> +++++++++++++++++++++++++++++++++++++++++++++++++++ >> 2 files changed, 400 insertions(+) >> create mode 100644 doc/design-os.rst >> >> diff --git a/doc/design-draft.rst b/doc/design-draft.rst >> index c821292..3ed3852 100644 >> --- a/doc/design-draft.rst >> +++ b/doc/design-draft.rst >> @@ -20,6 +20,7 @@ Design document drafts >> design-daemons.rst >> design-hsqueeze.rst >> design-ssh-ports.rst >> + design-os.rst >> >> .. vim: set textwidth=72 : >> .. Local Variables: >> diff --git a/doc/design-os.rst b/doc/design-os.rst >> new file mode 100644 >> index 0000000..a26801a >> --- /dev/null >> +++ b/doc/design-os.rst >> @@ -0,0 +1,399 @@ >> +=============================== >> +Ganeti OS installation redesign >> +=============================== >> + >> +.. contents:: :depth: 3 >> + >> +This is a design document detailing a new OS installation procedure, more >> +secure, able to provide more features and easier to use for many common >> tasks >> +w.r.t. the current one. >> + >> +Current state and shortcomings >> +============================== >> + >> +As of Ganeti 2.10, each instance is associated with an OS definition. An OS >> +definition is a set of scripts (``create``, ``export``, ``import``, >> ``rename``) >> +that are executed with root privileges on the primary host of the instance >> to >> +perform all the OS-related functionality (setting up an operating system >> inside >> +the disks of the instance being created, exporting/importing the instance, >> +renaming it). >> + >> +These scripts receive, as environment variables, a fixed set of parameters >> +describing the instance (such as the hypervisor, the name of the instance, >> the >> +number of disks, and their location) and a set of user defined parameters. >> Each >> +of these parameters is also written into the configuration file of Ganeti, >> to >> +allow for future reinstalls of the instance, and in various log files, >> namely: >> + >> +* node daemon log file: contains DEBUG strings of the ``/os_validate``, >> + ``/instance_os_add`` and ``/instance_start`` RPC calls. >> + >> +* master daemon log file: DEBUG strings related to the same RPC calls are >> stored >> + here as well. >> + >> +* commands log: the CLI commands that create a new instance, including their >> + parameters, are logged here. >> + >> +* RAPI log: the RAPI commands that create a new instances, including their >> + parameters, are logged here. >> + >> +* job logs: the job files stored in the job queue or in its archive contain >> the >> + parameters. >> + >> +The current situation presents a number of shortcomings: >> + >> +* Having the installation scripts run with root power on the nodes doesn't >> allow >> + user-defined OS scripts, as they would pose a huge security issue. >> + Furthermore, even a script without malicious intentions might end up >> + distrupting a node because of a bug in it. >> + >> +* Ganeti cannot be used to create instances starting from user provided disk >> + images: even in the (hypothetical) case where the scripts are completely >> + secure and run not by root but by an unprivileged user with only the >> power to >> + mount arbitrary files as disk images, this is a security issue. It has >> been >> + proven that a carefully crafted file system might exploit kernel >> + vulnerabilities to gain control of the system. Therefore, directly >> mounting >> + images on the Ganeti nodes is not an option. >> + >> +* There is no way to inject files into an existing disk image. A common use >> case >> + is for the system administrator to provide a standard image of the >> system, to >> + be later personalized with the network configuration, private keys >> identifying >> + the machine, ssh keys of the users and so on. A possible workaround would >> be >> + for the scripts to mount the image (only if this is trusted!) and to >> receive >> + the configurations and ssh keys as user defined OS parameters. >> Unfortunately, >> + this is also not an option for security sensitive material (such as the >> ssh >> + keys) because the OS parameters are stored in many places on the system, >> as >> + already described above. >> + >> +* Most other virtualization software simply work with instance images, not >> with >> + installation scripts. This difference makes the interaction of Ganeti with >> + other software difficult. >> + >> +Proposed changes >> +================ >> + >> +In order to fix the shortcomings of the current state, we plan to introduce >> the >> +following changes: >> + >> +* Change the OS parameters to have three categories: >> + >> + * ``public``: the current behavior. The parameter is logged and stored >> freely. >> + >> + * ``private``: the parameter is saved inside the Ganeti configuration (to >> allow >> + for instance reinstall) but it is not shown in logs, job logs, or passed >> back >> + via RAPI. >> + >> + * ``secret``: the parameter is not saved inside the Ganeti configuration. >> + Reinstall are impossible unless the data is passed again. The parameter >> will >> + not appear in any log file. When a functionality is performed jointly by >> + multiple daemons (such as MasterD and LuxiD), currently Ganeti sometimes >> + serializes jobs on disk and later reloads them. Secret parameters will >> not be >> + serialized on disk. They will be passed around as part of the LUXI calls >> + exchanged by the daemons, and only kept in memory, in order to reduce >> their >> + accessibility as much as possible. In case of a failure of the master >> node, >> + these parameters will be lost and cannot be recovered because they are >> not >> + serialized on file, therefore the job cannot taken over by the new >> master. >> + This is an expected and accepted side effect of jobs with secret >> parameters: >> + if they fail, they'll have to be restarted manually. >> + >> +* A new OS installation procedure, based on a safe virtualized environment. >> + This virtualized environment will run with the same hardware parameter as >> the >> + actual instance being installed, as much as possible. This will also >> allow to >> + reduce the memory usage in the host (specifically, in Dom0 for Xen >> + installations). Each instance will have these possible execution modes: >> + >> + * ``default``: the default mode, used when the machine is running >> normally and >> + the OS installation procedure is run before starting the instance for >> the >> + first time. > > Is this supposed to be ``run`` instead of ``default``. Here it says > default, but the rest of the document keeps mentioning the ``run`` mode, > which doesn't seem to be anywhere. > > Thanks, > Jose > >> + >> + * ``self_install``: the first run of the instance will be with a >> different set >> + of parameters w.r.t. all the successive runs. This set of "install >> + parameters" will allow, e.g., to attach an installation >> + floppy/cdrom/network, change the boot device order, or specify an OS >> image >> + to be used. Through this set of parameters, the administrator will have >> to >> + provide the hypervisor a way to find an installation medium for the >> instance >> + (e.g., a boot disk, a network image, etc). This medium will then >> install the >> + instance itself on the disks and will then be responsible to get the >> + parameters for configuring it (its network interfaces, IP address, >> hostname, >> + etc.) from a set of metadata provided by Ganeti (e.g.: using an approach >> + comparable to the one of the ``cloud-init`` tool). When this >> installation >> + mode is used, no OS installation script is required. In order for >> + installation of an OS from an image to be possible, the ``--os-type`` >> + parameter will be extended to support a new additional format: >> ``--os-type >> + image:<URL>`` will instruct ganeti to take an image from the specified >> + position. For the initial implementation, URL can be either a filename >> or a >> + publically accessible http or ftp resource. Once the instance image is >> + received, it will be dd-ed on the first disk of the instance. >> + When an image is specified, ``--os-parameters`` can still be used, >> + and its content will be passed to the instance as part of the metadata. >> Nota >> + that as part of the OS scripts there is a file specifying what >> parameters >> + are expected. With OS images, though, none of the traditional structure >> of >> + OS scripts is in place, so there will be no check regarding what >> parameters >> + can be specified: they will all be passed, as long as the >> + ``--os-parameters`` string is syntactically valid. >> + The set of ``self_install`` parameters will be stored as part of the >> + instance configuration, so that they can be used to reinstall the >> instance. >> + It will be the user's responsibility to ensure that the OS image or any >> + installation media is still available in the proper position when a >> + reinstall happens. After the first run, the instance will revert to >> + ``default`` mode. >> + >> + * ``install``: Ganeti will start the instance using a virtual appliance >> + specifically made for installing Ganeti instances. Scripts analogous to >> the >> + current ones will run inside this instance. The disks of the instance >> being >> + installed will be connected to this virtual appliance, so that the >> scripts >> + can mount them and modify them as needed, as currently happens, but >> with the >> + additional protection given by this happening in a VM. The disk of the >> + virtual appliance will be read only, so that a pristine copy of the >> + appliance can be started every time a new instance needs to be created, >> to >> + further increase security. The data the instance needs to write at >> runtime >> + will only be stored in RAM, and disappear as soon as the instance is >> + stopped. Metadata will be provided also to this virtual applicance, that >> + will take care of converting them to environment variables for the >> + installation scripts. After the first run, the instance will revert to >> + ``default`` mode. >> + >> +* In order to allow for the metadata to be sent inside the instance, a >> + communication mechanism between the instance and the host will be created. >> + This mechanism will be bidirectional (e.g.: to allow the setup process >> going >> + on inside the instance to communicate its progress to the host). Each >> instance >> + will have access exclusively to its own metadata, and it will be only >> able to >> + communicate with its host over this channel. More details will be >> provided in >> + the `Communication mechanism and metadata service`_ section. >> + >> +* As part of the instance creation command it will be possible to indicate >> a URL >> + for a "personalization package", that is an archive containing a set of >> files >> + meant to be overlayed on top of the operating system file system at the >> end of >> + the setup process, before the VM is started for the first time in ``run`` >> + mode. Ganeti will provide a mechanism for receiving and unpacking this >> + archive as part of the ``install`` execution mode, whereas in >> ``self_install`` >> + mode it will only be provided as a metadata for the instance to use. The >> + archive will be in TAR-GZIP format (with extension ``.tar.gz`` or >> ``.tgz``) >> + and will contain the files according to the directory structure that will >> be >> + recreated on the installation disk. Files contained in this archive will >> + overwrite files with the same path created during the install procedure >> (if >> + any). The URL of the "personalization package" will have to specify an >> + extesion to identify the file format (in order to allow for more formats >> to be >> + supported in the future). The URL will be stored as part of the >> configuration >> + of the instance (therefore, the URL should not contain confidential >> + information, but the file there available can). It is up to the system >> + administrator to ensure that a package is actually available at that URL >> at >> + install and reinstall time. The content of the package is allowed to >> change. >> + E.g.: a system administrator might create a package containing the private >> + keys of the instance being created. When the instance is reinstalled, a >> new >> + package with new keys can be made available there, therefore allowing >> instance >> + reinstall without the need to store keys. Together with the URL, a >> username >> + and a password can be specified to. If the URL is a http(s) URL, they >> will be >> + used as basic access authentication credentials to access that URL. The >> + username and password will not be saved in the config, and will have to be >> + provided again in case a reinstall is requested. The downloaded >> + personalization package will not be stored locally on the node for longer >> than >> + it is needed while unpacking it and adding its files to the instance being >> + created. The personalization package will be overlayed on top of the >> instance >> + filesystem after the scripts that created it have been executed. In >> order for >> + the files in the package to be automatically overlayed on top of the >> instance >> + filesystem it is required that the appliance is actually able to mount the >> + instance disks, therefore this will not work for every filesystem. >> + >> +Implementation >> +============== >> + >> +The implementation of this design will happen as an ordered sequence of >> steps, >> +of increasing impact on the system and, in some cases, dependent on each >> other: >> + >> +#. Private and secret instance parameters >> +#. Communication mechanism between host and instance >> +#. Metadata service >> +#. Personalization package (inside a virtualization environment) >> +#. ``self_install`` mode >> +#. ``install`` mode (inside a virtualization environment) >> + >> +Some of these steps need to be more deeply specified w.r.t. what is already >> +written in the `Proposed changes`_ Section. Extra details will be provided >> in >> +the following Subsections. >> + >> +Communication mechanism and metadata service >> +++++++++++++++++++++++++++++++++++++++++++++ >> + >> +The communication mechanism and the metadata service are described together >> +because they are deeply tied. On the other hand, the communication mechanism >> +will need to be more generic because it can be used for other reasons in the >> +future (like allowing instances to explicitly send commands to Ganeti, or >> to let >> +Ganeti control a helper instance, like the one hereby introduced for >> performing >> +OS installs inside a safe environment). >> + >> +The communication mechanism will be enabled automatically when the instance >> is >> +in ``self_install`` or ``install`` mode, but for backwards compatibility it >> will >> +be disabled when the instance is in ``run`` mode unless it is explicitly >> +requested. Specifically, a new parameter ``--communication`` (short version: >> +``-C``), with possible values ``true`` or ``false`` will be added to >> +``gnt-instance add`` and ``gnt-instance modify``. It will determine whether >> the >> +instance will have a communication channel set up to interact with the host >> and >> +to receive metadata. The value of this parameter will be saved as part of >> the >> +configuration of the instance. >> + >> +When the communication mechanism is enabled, Ganeti will create a new >> network >> +interface inside the instance. This extra network interface will be the >> last one >> +of the instance, after all the user defined ones. On the host side, this >> +interface will be only accessible to the host itself, and not be routed >> outside >> +the machine. >> +On this network interface, the instance will connect using the IP: >> +169.254.169.1 and netmask 255.255.255.0. >> +The host will be on the same network, with the IP address: 169.254.169.254. >> + >> +The way to create this interface depends on the specific hypervisor being >> used. >> +In KVM, it is possible to create a network interface inside the instance >> without >> +having a corresponding interface created on the host. Using a command like:: >> + >> + kvm -net nic -net \ >> + user,restrict=on,net=169.254.169.0/24,host=169.254.169.253, >> + guestfwd=tcp:169.254.169.254:80-tcp:127.0.0.1:8080 >> + >> +a network interface will be created inside the VM, part of the >> 169.254.169.0/24 >> +network, where the VM will have IP address .253 and the host port 8080 will >> be >> +reachable on port 80. >> + >> +In Xen, unfortunately, such a capability is not present, and an actual >> network >> +interface has to be created on the host (using the ``vif`` parameter of the >> Xen >> +configuration file). Each instance will have its corresponding ``vif`` >> network >> +interface on the host. These interface will not be connected to each other >> in >> +any way, and Ganeti will not configure them to allow traffic to be forwarded >> +beyond the host machine. The ``vif-route`` script of xen might be helpful in >> +implementing this. >> +It will be the system administrator to ensure that extra firewalling and >> routing >> +rules specified on the host don't allow this accidentally. >> + >> +The instance will be able to connect to 169.254.169.254:80, and issue GET >> +requests to an HTTP server that will provide the instance metadata. >> + >> +The choice of this IP address and port for accessing the metadata is done >> for >> +compatibility reasons with OpenStack's and Amazon EC2's ways of providing >> +metadata to the instance. The metadata will be provided by a single daemon, >> +which will determine what instance the request comes from and reply with the >> +metadata specific for that instance. >> + >> +Where possible, the metadata will be provided in a way compatible with >> Amazon >> +EC2, at:: >> + >> + http://169.254.169.254/<version>/meta-data/* >> + >> +If some metadata are Ganeti-specific and don't fit this structure, they >> will be >> +provided at:: >> + >> + http://169.254.169.254/ganeti/<version>/meta_data.json >> + >> +``<version>`` is either a date in YYYY-MM-DD format, or ``latest`` to >> indicate >> +the most recent available protocol version. >> + >> +If needed in the future, this structure also allows us to support >> OpenStack's >> +metadata at:: >> + >> + http://169.254.169.254/openstack/<version>/meta_data.json >> + >> +A bi-directional, pipe-like communication channel will be provided. The >> instance >> +will be able to receive data from the host by a GET request at:: >> + >> + http://169.254.169.254/ganeti/<version>/read >> + >> +and to send data to the host by a POST request at:: >> + >> + http://169.254.169.254/ganeti/<version>/write >> + >> +As in a pipe, once the data are read, they will not be in the buffer >> anymore, so >> +subsequent get request to ``read`` will not return the same data twice. >> +Unlike a pipe, though, it will not be possible to perform blocking I/O >> +operations. >> + >> +The OS parameters will be accessible through a GET >> +request at:: >> + >> + http://169.254.169.254/ganeti/<version>/os/parameters.json >> + >> +as a JSON serialized dictionary having the parameter name as the key, and >> the >> +pair ``(<value>, <visibility>)`` as the value, where ``<value>`` is the >> +user-provided value of the parameter, and ``<visibility>`` is either >> ``public``, >> +``private`` or ``secret``. >> + >> +The installation scripts to be run inside the virtualized environment while >> the >> +instance is run in ``install`` mode will be available at:: >> + >> + http://169.254.169.254/<version>/ganeti/os/scripts/<script_name> >> + >> +where ``<script_name>`` is the name of the script. >> + >> + >> +Rationale >> +--------- >> + >> +The choice of using a network interface for instance-host communication, as >> +opposed to VirtIO, XenBus or other methods, is due to the will of having a >> +generic, hypervisor-independent way of creating a communication channel, >> that >> +doesn't require unusual (para)virtualization drivers. >> +At the same time, a network interface was preferred over solutions involving >> +virtual floppy or USB devices because the latter tend to be detected and >> +configured by the guest operating systems, sometimes even in prominent >> positions >> +in the user interface, whereas it is fairly common to have an unconfigured >> +network interface in a system, usually without any negative side effects. >> + >> + >> +Installation process in a virtualized environment >> ++++++++++++++++++++++++++++++++++++++++++++++++++ >> + >> +In the new OS installation scenario, we distinguish between trusted and >> +untrusted code. >> + >> +The trusted installation code maintains the behavior of the current one and >> +requires no modifications, with the scripts running on the node the >> instance is >> +being created on. The untrusted code is stored in a subdirectory of the OS >> +definition called ``untrusted``. This directory contains scripts that are >> +equivalent to the already existing ones (``create``, ``export``, ``import``, >> +``rename``) but that will be run inside an virtualized environment, to >> protect >> +the host from malicious tampering. >> + >> +The ``untrusted`` code is meant to either be untrusted itself, or to be >> trusted >> +code running operations that might be dangerous (such as mounting a >> +user-provided image). >> + >> +By default, all new OS definitions will have to be explicitly marked as >> trusted >> +by the cluster administrator (with a new ``gnt-os modify`` command) before >> they >> +can run code on the host. Otherwise, only the untrusted part of the code >> will be >> +allowed to run, inside the virtual appliance. For backwards compatibility >> +reasons, when upgrading an existing cluster, all the installed OSes will be >> +marked as trusted, so that they can keep running with no changes. >> + >> +In order to allow for the highest flexibility, if both a trusted and an >> +untrusted script are provided for the same operation (i.e. ``create``), >> both of >> +them will be executed at the same time, one on the host, and one inside the >> +installation appliance. They will be allowed to communicate with each other >> +through the already described communication mechanism, in order to >> orchestrate >> +their execution (e.g.: the untrusted code might execute the installation, >> while >> +the trusted one receives status updates from it and delivers them to a user >> +interface). >> + >> +The cluster administrator will have an option to completely disable scripts >> +running on the host, leaving only the ones running in the VM. >> + >> +Ganeti will provide a script to be run at install time that can be used to >> +create the virtualized environment that will perform the OS installation of >> new >> +instances. >> +This script will build a debootstrapped basic debian system including >> including >> +a software that will read the metadata, setup the environment variables and >> +launch the installation scripts inside the virtualized environment. The >> script >> +will also provide hooks for personalization. >> + >> +It will also be possible to use other self-made virtualized environment, as >> long >> +as they connect to ganeti over the described communication mechanism and >> they >> +know how to read and use the provided metadata to create a new instance. >> + >> +While performing an installation in the virtualized environment, a >> +personalizable timeout will be used to detect possible problems with the >> +installation process, and to kill the virtualized environment. The timeout >> will >> +be optional and set on a cluster basis by the administrator. If set, it >> will be >> +the total time allowed to setup an instance inside the appliance. It is >> mainly >> +meant as a safety measure to prevent an instance taken over by malicious >> scripts >> +to be available for a long time. >> + >> +.. vim: set textwidth=72 : >> +.. Local Variables: >> +.. mode: rst >> +.. fill-column: 72 >> +.. End: >> -- >> 1.8.5.1 >> > > -- > Jose Antonio Lopes > Ganeti Engineering > Google Germany GmbH > Dienerstr. 12, 80331, München > > Registergericht und -nummer: Hamburg, HRB 86891 > Sitz der Gesellschaft: Hamburg > Geschäftsführer: Graham Law, Christine Elizabeth Flores > Steuernummer: 48/725/00206 > Umsatzsteueridentifikationsnummer: DE813741370 -- Guido Trotter Ganeti Engineering Google Germany GmbH Dienerstr. 12, 80331, München Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Geschäftsführer: Graham Law, Christine Elizabeth Flores Steuernummer: 48/725/00206 Umsatzsteueridentifikationsnummer: DE813741370
