On Thu, Nov 14, 2013 at 5:11 PM, Vangelis Koukis <[email protected]> wrote: > On Tue, Nov 12, 2013 at 11:41:05am +0000, Michele Tartara wrote: >> Add the document describing a new design for the OS installation process for >> new instances. >> > > Hello Michele, list, > > This is great work, nicely written, and easy to follow :) > Some comments follow inline. > >> Signed-off-by: Michele Tartara <[email protected]> >> --- >> doc/design-draft.rst | 1 + >> doc/design-os.rst | 318 >> ++++++++++++++++++++++++++++++++++++++++++++++++++ >> 2 files changed, 319 insertions(+) >> create mode 100644 doc/design-os.rst >> >> diff --git a/doc/design-draft.rst b/doc/design-draft.rst >> index c821292..3ed3852 100644 >> --- a/doc/design-draft.rst >> +++ b/doc/design-draft.rst >> @@ -20,6 +20,7 @@ Design document drafts >> design-daemons.rst >> design-hsqueeze.rst >> design-ssh-ports.rst >> + design-os.rst >> >> .. vim: set textwidth=72 : >> .. Local Variables: >> diff --git a/doc/design-os.rst b/doc/design-os.rst >> new file mode 100644 >> index 0000000..7a42a7f >> --- /dev/null >> +++ b/doc/design-os.rst >> @@ -0,0 +1,318 @@ >> +=============================== >> +Ganeti OS installation redesign >> +=============================== >> + >> +.. contents:: :depth: 3 >> + >> +This is a design document detailing a new OS installation procedure, more >> +secure, able to provide more features and easier to use for many common >> tasks >> +w.r.t. the current one. >> + >> +Current state and shortcomings >> +============================== >> + >> +As of Ganeti 2.10, each instance is associated with an OS definition. An OS >> +definition is a set of scripts (``create``, ``export``, ``import``, >> ``rename``) >> +that are executed with root privileges on the primary host of the instance >> to >> +perform all the OS-related functionality (setting up an operating system >> inside >> +the disks of the instance being created, exporting/importing the instance, >> +renaming it). >> + >> +These scripts receive, as environment variables, a fixed set of parameters >> +describing the instance (such as the hypervisor, the name of the instance, >> the >> +number of disks, and their location) and a set of user defined parameters. >> Each >> +of these parameters is also written into the configuration file of Ganeti, >> to >> +allow for future reinstalls of the instance, and in various log files, >> namely: >> + >> +* node daemon log file: contains DEBUG strings of the ``/os_validate``, >> + ``/instance_os_add`` and ``/instance_start`` RPC calls. >> + >> +* master daemon log file: DEBUG strings related to the same RPC calls are >> stored >> + here as well. >> + >> +* commands log: the CLI commands that create a new instance, including their >> + parameters, are logged here. >> + >> +* RAPI log: the RAPI commands that create a new instances, including their >> + parameters, are logged here. >> + >> +* job logs: the job files stored in the job queue or in its archive contain >> the >> + parameters. >> + >> +The current situation presents a number of shortcomings: >> + >> +* Having the installation scripts run with root power on the nodes is a huge >> + security issue. >> + >> +* Ganeti cannot be used to create instances starting from user provided disk >> + images: even in the (hypothetical) case where the scripts are completely >> + secure and run not by root but by an unprivileged user with only the >> power to >> + mount arbitrary files as disk images, this is a security issue. It has >> been >> + proven that a carefully crafted file system might exploit kernel >> + vulnerabilities to gain control of the system. Therefore, directly >> mounting >> + images on the Ganeti nodes is not an option. >> + >> +* There is no way to inject files into an existing disk image. A common use >> case >> + is for the system administrator to provide a standard image of the >> system, to >> + be later personalized with the network configuration, private keys >> identifying >> + the machine, ssh keys of the users and so on. A possible workaround would >> be >> + for the scripts to mount the image (only if this is trusted!) and to >> receive >> + the configurations and ssh keys as user defined OS parameters. >> Unfortunately, >> + this is also not an option for security sensitive material (such as the >> ssh >> + keys) because the OS parameters are stored in many places on the system, >> as >> + already described above. >> + >> +* Most other virtualization software simply work with instance images, not >> with >> + installation scripts. This difference makes the interaction of Ganeti with >> + other softwares difficult. >> + >> +Proposed changes >> +================ >> + >> +In order to fix the shortcomings of the current state, we plan to introduce >> the >> +following changes: >> + >> +* Change the OS parameters to have three categories: >> + >> + * ``public``: the current behavior. The parameter is logged and stored >> freely. >> + >> + * ``private``: the parameter is saved inside the Ganeti configuration (to >> allow >> + for instance reinstall) but it is not shown in logs, job logs, or passed >> back >> + via RAPI. >> + >> + * ``secret``: the parameter is not saved inside the Ganeti configuration. >> + Reinstall are impossible unless the data is passed again. The parameter >> will >> + not appear in any log file. In order to preserve the functionality of >> Ganeti, >> + the parameters will still need to be stored in the job files, but they >> will >> + be removed from there when the job has finished running (either >> successfully >> + or not). >> + > > +1000. :) > >> +* A new OS installation procedure, based on a safe virtualized environment. >> + This virtualized environment will run with the same hardware parameter as >> the >> + actual instance being installed, as much as possible. This will also >> allow to >> + reduce the memory usage in the host (specifically, in Dom0 for Xen >> + installations). Each instance will have these possible execution modes: >> + >> + * ``run``: the default mode, used when the machine is running normally. >> + >> + * ``self_install``: Ganeti will start the instance with a different set of >> + user-specified parameters, therefore allowing to attach an installation >> + floppy/cdrom/network, change the boot device order, or specify an OS >> image >> + to be used. The instance will then be responsible to get the parameters >> for >> + configuring itself (its network interfaces, IP address, hostname, etc.) >> from >> + a set of metadata provided to it by Ganeti (e.g.: using an approach >> + comparable to the one of the ``cloud-init`` tool). When this >> installation >> + mode is used, no OS installation script is required. >> + In order for installation of an OS from an image to be possible, a new >> + parameter ``--os-image`` will be added, allwoing to specify where to >> take >> + the image from. It will have to be mutually exclusive with >> ``--os-type``. If > > Minor typo, "allwoing"->"allowing".
Will fix, thanks. > > Who will be responsible to "take the image" from the location specified > as --os-image? If I understand correctly, it will be up to the instance > running in ``self_install`` mode to interpret the meaning of --os-image > (e.g., it's a URL) and do all necessary tasks to deploy it, e.g., > wget'ing it and dd'ing onto the final disk of the instance. If this is > the case, it would be nice to mention it explicitly in the design doc. Not really. This point was probably a bit unclear and I'll try to make it more clear here (and, later, in the updated design). ``self_install`` mode is just a different set of parameters used to run an instance the first time it is executed. This parameters will have to be enough to make sure that an instance started with them can actually do something. Typically, these parameters will connect a cdrom image, or a network such that when the (still unitialized!!) instance is booted, it will find some kind of installation medium that will install (and, accessing the metadata, self configure) the instance. At the next boot, the instance will then be started with its default set of parameters. Note that the parameters of the ``self_install`` mode will be stored in the config, so that they can be used for a reinstall. The --os-image parameter is related to this: when it is specified at "gnt-instance add" time, ganeti itself will access the URL, download (or copy, depending on the resource) the file that is there, and dd it onto the instance disk. Only at this point the instance will be booted. So, the download and dd will not happen in the running instance. > >> + ``--os-image`` is specified, ``--os-parameters`` can still be used, as >> it >> + will be passed to the instance as part of the metadata. >> + The set of ``self_install`` parameters will be stored as part of the >> + instance configuration, so that they can be used to reinstall the >> instance. >> + It will be the user's responsibility to ensure that the OS image or any >> + installation media is still available in the proper position when a >> + reinstall happens. >> + >> + * ``install``: Ganeti will start the instance using a virtual appliance >> + specifically made for installing Ganeti instances. Scripts analogous to >> the >> + current ones will run inside this instance. The disks of the instance >> being >> + installed will be connected to this virtual appliance, so that the >> scripts >> + can mount them and modify them as needed, as currently happens, but >> with the >> + additional protection given by this happening in a VM. The virtual >> appliance >> + will be started in a clean state every time a new instance need to be >> + created, to further increase security. Metadata will be provided also to >> + this virtual applicance, that will take care of converting them to >> + environment variables for the installation scripts. >> + > > How will Ganeti create this virtual appliance? Will there be > a predetermined Ganeti-provided appliance in a specific format, e.g., a > raw disk, available at a specific location, e.g., under /var/lib/ganeti, > on all nodes? The idea is to provide a sample script able to create an "official version" of such an appliance. Then, yes, the created appliance will have to be stored on the nodes as you describe at the node setup time (a bit like it happens today with the OS install scripts). Other than that, everybody will be able to create its own installation appliance: the only prescription for it will be that it is VM image that, when booted, creates an instance according to the indications (and the install scripts, if provided) contained in the metadata. How this is actually going to happen, is up to the sysadmin that will want to create its own appliance to decide. > > Also, how will the instance be able to access the contents of this > appliance? Will it be its first connected disk? If yes, will it be > connected as a VirtIO, IDE, SCSI disk? And if yes, how will this > appliance be usable by multiple instances running in ``install`` mode > concurrently? When in install mode, the virtual appliance will be started inside the context that (later) will be of the instance. The first disk will be the one of the appliance (mounted read only, so it can actually be mounted multiple times in parallel). The disks of the actual instance will also be connected to the running appliance, and the metadata will provide the mapping between the connected device and their final expected position in the to-be-installed instance (as the environment variables are doing today). The connection type of the disks will be the same that Ganeti already uses to connect disks to the running instances. > > What we currently do with snf-image, is that we run a special helper VM > to cope with tasks which in the new model could be presumably undertaken > by the instance running in ``install`` mode. To minimize the execution > time of our OS definition, we had snf-image provision the disks for > helper VMs thinly, as QCOW2 images based off a read-only base raw > disk under /var/lib/snf-image/helper. This allows helper VMs to have > write access to their virtual disk, we just keep a limited number of > changed blocks in memory and don't really care about them anyway. > > This allows us to always start helper VMs from a consistent, "clean" > state as you also say. > > We cannot really afford having to copy all of the virtual appliance data > to an actual (extra) disk for the instance, e.g., a DRBD disk, just for > running in ``install`` mode. Yes, this is basically what I have in mind too: read only image, and runtime data on top of it in ram. > > How does this match with Ganeti's (in)ability to have disks of different > type connected to the same VM? I assume the first disk of the instance > will be the appliance, and the second disk will be the actual disk of > the instance being created. Yes, your assumption is correct. The easiest albeit less elegant way would be to handle the virtual appliance separately, with its own specific code so that its specific disk can be added even if it's of a different type. Otherwise, I guess we'll have to look more into getting rid of the single disk type problem. >> +In order to allow for the metadata to be sent inside the instance, a >> +communication mechanism between the instance and the host will be created. >> This >> +mechanism will be bidirectional (e.g.: to allow the setup process going on >> +inside the instance to communicate its progress to the host). Each instance >> will >> +have access exclusively to its own metadata, and it will be only able to >> +communicate with its host over this channel. >> + >> +As part of the instance creation command it will be possible to indicate a >> URL >> +for a "personalization package", that is an archive containing a set of >> files >> +meant to be overlayed on top of the operating system file system at the end >> of >> +the setup process, before the VM is started for the first time in ``run`` >> mode. >> +Ganeti will provide a mechanism for receiving and unpacking this archive as >> part >> +of the ``install`` execution mode, whereas in ``self_install`` mode it will >> only >> +be provided as a metadata for the instance to use. > > I'm not sure I understand this. > Where will Ganeti unpack this archive? Will Ganeti be responsible for > unpacking it into the virtual appliance before starting it for example? > In any case, Ganeti won't probably be able to unpack it directly into > the final disk of the instance, because this would mean actually having > to make it aware of different filesystems (e.g., ext4 vs. btrfs), > partitioning schemes (e.g., MS-DOS partitions vs. BSD disklabels vs. LVM, > etc). Nice catch. I hadn't considered the fact that even if the virtual appliance has access to the disks, it doesn't necessarily know how to mount them. Only the scripts actually know it. So, either this feature remains just passing the URL in a specific and "official" position in the metadata, or it can be specified that it only works on a restricted number of file systems, that can be mounted by the appliance. > >> +The archive will be in TAR-GZIP format (with extension ``.tar.gz`` or >> ``.tgz``) >> +and will contain the files according to the directory structure that will be >> +recreated on the installation disk. Files contained in this archive will >> +overwrite files with the same path created during the install procedure (if >> +any). > > Assuming the tarball describes a set of files to be overlayed on an > instance disk, does this mechanism suffice? For example, when injecting > files it could make sense to define Access Control Lists for them, the > format and contents of which may depend on the target filesystem (e.g., > an NTFS filesystem may have richer ACL semantics than can be represented > by TAR-GZIP). > > I think it would be better to just provide a way for the instance > running in ``install`` mode to retrieve the contents of the > personalization package over the communication channel, and let it deal > with it in whatever way it sees fit. And this would the be "URL only" option I described. I'll have to consider what approach is the best one, but currently I'm leaning towards "URL only", as you suggest. > >> +The URL of the "personalization package" will have to specify an extesion to >> +identify the file format (in order to allow for more formats to be >> supported in >> +the future). >> +The URL will be stored as part of the configuration of the instance >> (therefore, >> +the URL should not contain confidential information, but the file there >> +available can). It is up to the system administrator to ensure that a >> package >> +is actually available at that URL at install and reinstall time. >> +The content of the package is allowed to change. E.g.: a system >> administrator >> +might create a package containing the private keys of the instance being >> +created. When the instance is reinstalled, a new package with new keys can >> be >> +made available there, therefore allowing instance reinstall without the >> need to >> +store keys. >> + >> +Implementation >> +============== >> + >> +The implementation of this design will happen as an ordered sequence of >> steps, >> +of increasing impact on the system and, in some cases, dependent on each >> other: >> + >> +#. Private and secret instance parameters >> +#. Communication mechanism between host and instance >> +#. Metadata service >> +#. Personalization package >> +#. ``self_install`` mode >> +#. ``install`` mode (with virtualization environment) >> + >> +Some of these steps need to be more deeply specified w.r.t. what is already >> +written in the `Proposed changes`_ Section. Extra details will be provided >> in >> +the following Subsections. >> + >> +Communication mechanism and metadata service >> +++++++++++++++++++++++++++++++++++++++++++++ >> + >> +The communication mechanism and the metadata service are described together >> +because they are deeply tied. On the other hand, the communication mechanism >> +will need to be more generic because it can be used for other reasons in the >> +future (like allowing instances to esplicitly send commands to Ganeti, or >> to let >> +Ganeti control a helper instance, like the one hereby introduced for >> performing >> +OS installs inside a safe environment). >> + >> +The communication mechanism will be enabled automatically when the instance >> is >> +in ``self_install`` or ``install`` mode, but for backwards compatibility it >> will >> +be disabled when the instance is in ``run`` mode unless it is esplicitly >> +requested at instance startup by using a new, ad-hoc, parameter >> +(``--communication``). >> + >> +When the communication mechanism is enabled, Ganeti will create a new >> network >> +interface inside the instance. This extra network interface will be the >> last one >> +of the instance, after all the user defined ones. On the host side, this >> +interface will be only accessible to the host itself, and not be routed >> outside >> +the machine. >> +On this network interface, the instance will connect using the IP: >> +169.254.169.1 and netmask 255.255.255.0. >> +The host will be on the same network, with the IP address: 169.254.169.254. >> +The instance will be able to connect to 169.254.169.254:80, and issue GET >> +requests to an HTTP server that will provide the instance metadata. >> + >> +The choice of this IP address and port is done for compatibility reasons >> with >> +OpenStack's and Amazon EC2's ways of providing metadata to the instance. >> + >> +Where possible, the metadata will be provided in a way compatible with >> OpenStack >> +at:: >> + >> + http://169.254.169.254/openstack/<version>/meta_data.json >> + >> +or with Amazon EC2, at:: >> + >> + http://169.254.169.254/<version>/meta-data/* >> + >> +If some metadata are Ganeti-specific and don't fit this structure, they >> will be >> +provided at:: >> + >> + http://169.254.169.254/<version>/ganeti/meta_data.json >> + >> +``<version>`` is either a date in YYYY-MM-DD format, or ``latest`` to >> indicate >> +the most recent available protocol version. >> + >> +A bi-directional, pipe-like communication channel will be provided. The >> instance >> +will be able to receive data from the host by a GET request at:: >> + >> + http://169.254.169.254/<version>/ganeti/pipe_in >> + >> +and to send data to the host by a POST request at:: >> + >> + http://169.254.169.254/<version>/ganeti/pipe_out >> + >> +As in a pipe, once the data are read, they will not be in the buffer >> anymore, so >> +subsequent get request to ``pipe_in`` will not return the same data twice. >> +Unlike a pipe, though, it will not be possible to perform blocking I/O >> +operations. >> + >> +The OS parameters will be accessible through a GET >> +request at:: >> + >> + http://169.254.169.254/<version>/ganeti/os/parameters/<visibility>.json >> + >> +as a JSON serialized dictionary. ``<visibility>`` will be either ``public`` >> or >> +``private`` or ``secret``. >> + >> +The installation scripts to be run inside the virtualized environment while >> the >> +instance is run in ``install`` mode will be available at:: >> + >> + http://169.254.169.254/<version>/ganeti/os/scripts/<script_name> >> + >> +where ``<script_name>`` is the name of the script. >> + >> +The host and the instances (as detailed in `Installation process in a >> +virtualized environment`_) will be able to create other communication >> channels >> +on the other ports of the same IP address. >> + > > This is a great idea. > We'd love to see a standard HTTP-based communication channel provided by > Ganeti, maintained even when the instance is in its final ``run`` mode. > > The question is, again, whether we need to create a tap interface > or not. Given that the interface (could) be made available to the instance > indefinitely, e.g., using the --communication ad hoc argument, it makes > sense to have it just as another network interface. This would enable > the administrator to use it just as another interface, e.g., set > policies on it, e.g., firewall it, or run tcpdump on it. > > Having it be created using KVM's "-net user" option would require a > totally different way of managing it, as Apollon also notices. > > However, if it's just another interface, how does it interact with > Ganeti's network modification mechanism? Will the user be able to remove > this interface via gnt-instance modify? If yes, there'll be no way to > bring it back. So perhaps this interface should be treated specially by > Ganeti itself. Yes, if it's going to be an interface, it will definitely be "special", as it's managed specifically for and by the communication mechanism. But the whole communication thing definitely needs some more though before being finalized. > >> + >> +Rationale >> +--------- >> + >> +The choice of using a network interface for instance-host communication, as >> +opposed to VirtIO, XenBus or other methods, is due to the will of having a >> +generic, hypervisor-independent way of creating a communication channel, >> that >> +doesn't require unusual (para)virtualization drivers. >> +At the same time, a network interface was preferred over solutions involving >> +virtual floppy or USB devices because the latter tend to be detected and >> +configured by the guest operating systems, sometimes even in prominent >> positions >> +in the user interface, whereas it is fairly common to have an unconfigured >> +network interface in a system, usually without any negative side effects. >> + >> + >> +Installation process in a virtualized environment >> ++++++++++++++++++++++++++++++++++++++++++++++++++ >> + >> +In the new OS installation scenario, we distinguish between trusted and >> +untrusted code. >> + >> +The trusted installation code maintains the behavior of the current one, >> with >> +the scripts running on the node the instance is being created on. The >> untrusted >> +code is stored in a subdirectory of the OS definition called ``untrusted``. >> +This directory contains scripts that are equivalent to the already existing >> +ones (``create``, ``export``, ``import``, ``rename``) but that will be run >> +inside an virtualized environment, to protect the host from malicious >> tampering. >> + >> +The ``untrusted`` code is meant to either be untrusted itself, or to be >> trusted >> +code running operations that might be dangerous (such as mounting a >> +user-provided image). >> + >> +In order to allow for the highest flexibility, if both a trusted and an >> +untrusted script are provided for the same operation (i.e. ``create``), >> both of >> +them will be executed at the same time, one on the host, and one inside the >> +installation appliance. They will be allowed to communicate with each other >> +through the already described communication mechanism, in order to >> orchestrate >> +their execution (e.g.: the untrusted code might execute the installation, >> while >> +the trusted one receives status updates from it and delivers them to a user >> +interface). >> + > > This is important, but I'll comment on it as a reply to your reply to > Guido, in another part of this thread. > >> +Ganeti will provide a script to be run at install time that can be used to >> +create the virtualized environment that will perform the OS installation of >> new >> +instances. >> +This script will build a debootstrapped basic debian system including >> including >> +a software that will read the metadata, setup the environment variables and >> +launch the installation scripts inside the virtualized environment. The >> script >> +will also provide hooks for personalization. >> + >> +It will also be possible to use other self-made virtualized environment, as >> long >> +as they connect to ganeti over the described communication mechanism and >> they >> +know how to read and use the provided metadata to create a new instance. >> + >> +While performing an installation in the virtualized environment, a >> +personalizable timeout will be used to detect possible problems with the >> +installation process, and to kill the virtualized environment. >> + >> +.. vim: set textwidth=72 : >> +.. Local Variables: >> +.. mode: rst >> +.. fill-column: 72 >> +.. End: >> -- >> 1.7.10.4 > > Thanks again for the great design doc, > Vangelis. I'm glad you like it. Thank you for all the online and offline input. Cheers, Michele -- Google Germany GmbH Dienerstr. 12 80331 München Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Geschäftsführer: Graham Law, Christine Elizabeth Flores
