On Wed, Dec 11, 2013 at 4:10 PM, Jose A. Lopes <[email protected]> wrote:
> On Mon, Dec 09, 2013 at 10:30:17AM +0100, Michele Tartara wrote:
>> Add the document describing a new design for the OS installation process for
>> new instances.
>>
>> Signed-off-by: Michele Tartara <[email protected]>

LGTM

Thanks,

Guido


>> ---
>>  doc/design-draft.rst |   1 +
>>  doc/design-os.rst    | 399 
>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 400 insertions(+)
>>  create mode 100644 doc/design-os.rst
>>
>> diff --git a/doc/design-draft.rst b/doc/design-draft.rst
>> index c821292..3ed3852 100644
>> --- a/doc/design-draft.rst
>> +++ b/doc/design-draft.rst
>> @@ -20,6 +20,7 @@ Design document drafts
>>     design-daemons.rst
>>     design-hsqueeze.rst
>>     design-ssh-ports.rst
>> +   design-os.rst
>>
>>  .. vim: set textwidth=72 :
>>  .. Local Variables:
>> diff --git a/doc/design-os.rst b/doc/design-os.rst
>> new file mode 100644
>> index 0000000..a26801a
>> --- /dev/null
>> +++ b/doc/design-os.rst
>> @@ -0,0 +1,399 @@
>> +===============================
>> +Ganeti OS installation redesign
>> +===============================
>> +
>> +.. contents:: :depth: 3
>> +
>> +This is a design document detailing a new OS installation procedure, more
>> +secure, able to provide more features and easier to use for many common 
>> tasks
>> +w.r.t. the current one.
>> +
>> +Current state and shortcomings
>> +==============================
>> +
>> +As of Ganeti 2.10, each instance is associated with an OS definition. An OS
>> +definition is a set of scripts (``create``, ``export``, ``import``, 
>> ``rename``)
>> +that are executed with root privileges on the primary host of the instance 
>> to
>> +perform all the OS-related functionality (setting up an operating system 
>> inside
>> +the disks of the instance being created, exporting/importing the instance,
>> +renaming it).
>> +
>> +These scripts receive, as environment variables, a fixed set of parameters
>> +describing the instance (such as the hypervisor, the name of the instance, 
>> the
>> +number of disks, and their location) and a set of user defined parameters. 
>> Each
>> +of these parameters is also written into the configuration file of Ganeti, 
>> to
>> +allow for future reinstalls of the instance, and in various log files, 
>> namely:
>> +
>> +* node daemon log file: contains DEBUG strings of the ``/os_validate``,
>> +  ``/instance_os_add`` and ``/instance_start`` RPC calls.
>> +
>> +* master daemon log file: DEBUG strings related to the same RPC calls are 
>> stored
>> +  here as well.
>> +
>> +* commands log: the CLI commands that create a new instance, including their
>> +  parameters, are logged here.
>> +
>> +* RAPI log: the RAPI commands that create a new instances, including their
>> +  parameters, are logged here.
>> +
>> +* job logs: the job files stored in the job queue or in its archive contain 
>> the
>> +  parameters.
>> +
>> +The current situation presents a number of shortcomings:
>> +
>> +* Having the installation scripts run with root power on the nodes doesn't 
>> allow
>> +  user-defined OS scripts, as they would pose a huge security issue.
>> +  Furthermore, even a script without malicious intentions might end up
>> +  distrupting a node because of a bug in it.
>> +
>> +* Ganeti cannot be used to create instances starting from user provided disk
>> +  images: even in the (hypothetical) case where the scripts are completely
>> +  secure and run not by root but by an unprivileged user with only the 
>> power to
>> +  mount arbitrary files as disk images, this is a security issue. It has 
>> been
>> +  proven that a carefully crafted file system might exploit kernel
>> +  vulnerabilities to gain control of the system. Therefore, directly 
>> mounting
>> +  images on the Ganeti nodes is not an option.
>> +
>> +* There is no way to inject files into an existing disk image. A common use 
>> case
>> +  is for the system administrator to provide a standard image of the 
>> system, to
>> +  be later personalized with the network configuration, private keys 
>> identifying
>> +  the machine, ssh keys of the users and so on. A possible workaround would 
>> be
>> +  for the scripts to mount the image (only if this is trusted!) and to 
>> receive
>> +  the configurations and ssh keys as user defined OS parameters. 
>> Unfortunately,
>> +  this is also not an option for security sensitive material (such as the 
>> ssh
>> +  keys) because the OS parameters are stored in many places on the system, 
>> as
>> +  already described above.
>> +
>> +* Most other virtualization software simply work with instance images, not 
>> with
>> +  installation scripts. This difference makes the interaction of Ganeti with
>> +  other software difficult.
>> +
>> +Proposed changes
>> +================
>> +
>> +In order to fix the shortcomings of the current state, we plan to introduce 
>> the
>> +following changes:
>> +
>> +* Change the OS parameters to have three categories:
>> +
>> + * ``public``: the current behavior. The parameter is logged and stored 
>> freely.
>> +
>> + * ``private``: the parameter is saved inside the Ganeti configuration (to 
>> allow
>> +   for instance reinstall) but it is not shown in logs, job logs, or passed 
>> back
>> +   via RAPI.
>> +
>> + * ``secret``: the parameter is not saved inside the Ganeti configuration.
>> +   Reinstall are impossible unless the data is passed again. The parameter 
>> will
>> +   not appear in any log file. When a functionality is performed jointly by
>> +   multiple daemons (such as MasterD and LuxiD), currently Ganeti sometimes
>> +   serializes jobs on disk and later reloads them. Secret parameters will 
>> not be
>> +   serialized on disk. They will be passed around as part of the LUXI calls
>> +   exchanged by the daemons, and only kept in memory, in order to reduce 
>> their
>> +   accessibility as much as possible. In case of a failure of the master 
>> node,
>> +   these parameters will be lost and cannot be recovered because they are 
>> not
>> +   serialized on file, therefore the job cannot taken over by the new 
>> master.
>> +   This is an expected and accepted side effect of jobs with secret 
>> parameters:
>> +   if they fail, they'll have to be restarted manually.
>> +
>> +* A new OS installation procedure, based on a safe virtualized environment.
>> +  This virtualized environment will run with the same hardware parameter as 
>> the
>> +  actual instance being installed, as much as possible. This will also 
>> allow to
>> +  reduce the memory usage in the host (specifically, in Dom0 for Xen
>> +  installations). Each instance will have these possible execution modes:
>> +
>> +  * ``default``: the default mode, used when the machine is running 
>> normally and
>> +    the OS installation procedure is run before starting the instance for 
>> the
>> +    first time.
>
> Is this supposed to be ``run`` instead of ``default``.  Here it says
> default, but the rest of the document keeps mentioning the ``run`` mode,
> which doesn't seem to be anywhere.
>
> Thanks,
> Jose
>
>> +
>> +  * ``self_install``: the first run of the instance will be with a 
>> different set
>> +    of parameters w.r.t. all the successive runs. This set of "install
>> +    parameters" will allow, e.g., to attach an installation
>> +    floppy/cdrom/network, change the boot device order, or specify an OS 
>> image
>> +    to be used. Through this set of parameters, the administrator will have 
>> to
>> +    provide the hypervisor a way to find an installation medium for the 
>> instance
>> +    (e.g., a boot disk, a network image, etc). This medium will then 
>> install the
>> +    instance itself on the disks and will then be responsible to get the
>> +    parameters for configuring it (its network interfaces, IP address, 
>> hostname,
>> +    etc.) from a set of metadata provided by Ganeti (e.g.: using an approach
>> +    comparable to the one of the ``cloud-init`` tool). When this 
>> installation
>> +    mode is used, no OS installation script is required.  In order for
>> +    installation of an OS from an image to be possible, the ``--os-type``
>> +    parameter will be extended to support a new additional format: 
>> ``--os-type
>> +    image:<URL>`` will instruct ganeti to take an image from the specified
>> +    position. For the initial implementation, URL can be either a filename 
>> or a
>> +    publically accessible http or ftp resource. Once the instance image is
>> +    received, it will be dd-ed on the first disk of the instance.
>> +    When an image is specified, ``--os-parameters`` can still be used,
>> +    and its content will be passed to the instance as part of the metadata. 
>> Nota
>> +    that as part of the OS scripts there is a file specifying what 
>> parameters
>> +    are expected. With OS images, though, none of the traditional structure 
>> of
>> +    OS scripts is in place, so there will be no check regarding what 
>> parameters
>> +    can be specified: they will all be passed, as long as the
>> +    ``--os-parameters`` string is syntactically valid.
>> +    The set of ``self_install`` parameters will be stored as part of the
>> +    instance configuration, so that they can be used to reinstall the 
>> instance.
>> +    It will be the user's responsibility to ensure that the OS image or any
>> +    installation media is still available in the proper position when a
>> +    reinstall happens. After the first run, the instance will revert to
>> +    ``default`` mode.
>> +
>> +  * ``install``: Ganeti will start the instance using a virtual appliance
>> +    specifically made for installing Ganeti instances. Scripts analogous to 
>> the
>> +    current ones will run inside this instance. The disks of the instance 
>> being
>> +    installed will be connected to this virtual appliance, so that the 
>> scripts
>> +    can mount them and modify them as needed, as currently happens, but 
>> with the
>> +    additional protection given by this happening in a VM. The disk of the
>> +    virtual appliance will be read only, so that a pristine copy of the
>> +    appliance can be started every time a new instance needs to be created, 
>> to
>> +    further increase security. The data the instance needs to write at 
>> runtime
>> +    will only be stored in RAM, and disappear as soon as the instance is
>> +    stopped. Metadata will be provided also to this virtual applicance, that
>> +    will take care of converting them to environment variables for the
>> +    installation scripts. After the first run, the instance will revert to
>> +    ``default`` mode.
>> +
>> +* In order to allow for the metadata to be sent inside the instance, a
>> +  communication mechanism between the instance and the host will be created.
>> +  This mechanism will be bidirectional (e.g.: to allow the setup process 
>> going
>> +  on inside the instance to communicate its progress to the host). Each 
>> instance
>> +  will have access exclusively to its own metadata, and it will be only 
>> able to
>> +  communicate with its host over this channel. More details will be 
>> provided in
>> +  the `Communication mechanism and metadata service`_ section.
>> +
>> +* As part of the instance creation command it will be possible to indicate 
>> a URL
>> +  for a "personalization package", that is an archive containing a set of 
>> files
>> +  meant to be overlayed on top of the operating system file system at the 
>> end of
>> +  the setup process, before the VM is started for the first time in ``run``
>> +  mode.  Ganeti will provide a mechanism for receiving and unpacking this
>> +  archive as part of the ``install`` execution mode, whereas in 
>> ``self_install``
>> +  mode it will only be provided as a metadata for the instance to use.  The
>> +  archive will be in TAR-GZIP format (with extension ``.tar.gz`` or 
>> ``.tgz``)
>> +  and will contain the files according to the directory structure that will 
>> be
>> +  recreated on the installation disk. Files contained in this archive will
>> +  overwrite files with the same path created during the install procedure 
>> (if
>> +  any).  The URL of the "personalization package" will have to specify an
>> +  extesion to identify the file format (in order to allow for more formats 
>> to be
>> +  supported in the future).  The URL will be stored as part of the 
>> configuration
>> +  of the instance (therefore, the URL should not contain confidential
>> +  information, but the file there available can). It is up to the system
>> +  administrator to ensure that a package is actually available at that URL 
>> at
>> +  install and reinstall time.  The content of the package is allowed to 
>> change.
>> +  E.g.: a system administrator might create a package containing the private
>> +  keys of the instance being created. When the instance is reinstalled, a 
>> new
>> +  package with new keys can be made available there, therefore allowing 
>> instance
>> +  reinstall without the need to store keys.  Together with the URL, a 
>> username
>> +  and a password can be specified to. If the URL is a http(s) URL, they 
>> will be
>> +  used as basic access authentication credentials to access that URL. The
>> +  username and password will not be saved in the config, and will have to be
>> +  provided again in case a reinstall is requested.  The downloaded
>> +  personalization package will not be stored locally on the node for longer 
>> than
>> +  it is needed while unpacking it and adding its files to the instance being
>> +  created.  The personalization package will be overlayed on top of the 
>> instance
>> +  filesystem after the scripts that created it have been executed.  In 
>> order for
>> +  the files in the package to be automatically overlayed on top of the 
>> instance
>> +  filesystem it is required that the appliance is actually able to mount the
>> +  instance disks, therefore this will not work for every filesystem.
>> +
>> +Implementation
>> +==============
>> +
>> +The implementation of this design will happen as an ordered sequence of 
>> steps,
>> +of increasing impact on the system and, in some cases, dependent on each 
>> other:
>> +
>> +#. Private and secret instance parameters
>> +#. Communication mechanism between host and instance
>> +#. Metadata service
>> +#. Personalization package (inside a virtualization environment)
>> +#. ``self_install`` mode
>> +#. ``install`` mode (inside a virtualization environment)
>> +
>> +Some of these steps need to be more deeply specified w.r.t. what is already
>> +written in the `Proposed changes`_ Section. Extra details will be provided 
>> in
>> +the following Subsections.
>> +
>> +Communication mechanism and metadata service
>> +++++++++++++++++++++++++++++++++++++++++++++
>> +
>> +The communication mechanism and the metadata service are described together
>> +because they are deeply tied. On the other hand, the communication mechanism
>> +will need to be more generic because it can be used for other reasons in the
>> +future (like allowing instances to explicitly send commands to Ganeti, or 
>> to let
>> +Ganeti control a helper instance, like the one hereby introduced for 
>> performing
>> +OS installs inside a safe environment).
>> +
>> +The communication mechanism will be enabled automatically when the instance 
>> is
>> +in ``self_install`` or ``install`` mode, but for backwards compatibility it 
>> will
>> +be disabled when the instance is in ``run`` mode unless it is explicitly
>> +requested. Specifically, a new parameter ``--communication`` (short version:
>> +``-C``), with possible values ``true`` or ``false`` will be added to
>> +``gnt-instance add`` and ``gnt-instance modify``. It will determine whether 
>> the
>> +instance will have a communication channel set up to interact with the host 
>> and
>> +to receive metadata. The value of this parameter will be saved as part of 
>> the
>> +configuration of the instance.
>> +
>> +When the communication mechanism is enabled, Ganeti will create a new 
>> network
>> +interface inside the instance. This extra network interface will be the 
>> last one
>> +of the instance, after all the user defined ones. On the host side, this
>> +interface will be only accessible to the host itself, and not be routed 
>> outside
>> +the machine.
>> +On this network interface, the instance will connect using the IP:
>> +169.254.169.1 and netmask 255.255.255.0.
>> +The host will be on the same network, with the IP address: 169.254.169.254.
>> +
>> +The way to create this interface depends on the specific hypervisor being 
>> used.
>> +In KVM, it is possible to create a network interface inside the instance 
>> without
>> +having a corresponding interface created on the host. Using a command like::
>> +
>> +  kvm -net nic -net \
>> +    user,restrict=on,net=169.254.169.0/24,host=169.254.169.253,
>> +    guestfwd=tcp:169.254.169.254:80-tcp:127.0.0.1:8080
>> +
>> +a network interface will be created inside the VM, part of the 
>> 169.254.169.0/24
>> +network, where the VM will have IP address .253 and the host port 8080 will 
>> be
>> +reachable on port 80.
>> +
>> +In Xen, unfortunately, such a capability is not present, and an actual 
>> network
>> +interface has to be created on the host (using the ``vif`` parameter of the 
>> Xen
>> +configuration file). Each instance will have its corresponding ``vif`` 
>> network
>> +interface on the host. These interface will not be connected to each other 
>> in
>> +any way, and Ganeti will not configure them to allow traffic to be forwarded
>> +beyond the host machine. The ``vif-route`` script of xen might be helpful in
>> +implementing this.
>> +It will be the system administrator to ensure that extra firewalling and 
>> routing
>> +rules specified on the host don't allow this accidentally.
>> +
>> +The instance will be able to connect to 169.254.169.254:80, and issue GET
>> +requests to an HTTP server that will provide the instance metadata.
>> +
>> +The choice of this IP address and port for accessing the metadata is done 
>> for
>> +compatibility reasons with OpenStack's and Amazon EC2's ways of providing
>> +metadata to the instance. The metadata will be provided by a single daemon,
>> +which will determine what instance the request comes from and reply with the
>> +metadata specific for that instance.
>> +
>> +Where possible, the metadata will be provided in a way compatible with 
>> Amazon
>> +EC2, at::
>> +
>> +  http://169.254.169.254/<version>/meta-data/*
>> +
>> +If some metadata are Ganeti-specific and don't fit this structure, they 
>> will be
>> +provided at::
>> +
>> +  http://169.254.169.254/ganeti/<version>/meta_data.json
>> +
>> +``<version>`` is either a date in YYYY-MM-DD format, or ``latest`` to 
>> indicate
>> +the most recent available protocol version.
>> +
>> +If needed in the future, this structure also allows us to support 
>> OpenStack's
>> +metadata at::
>> +
>> +  http://169.254.169.254/openstack/<version>/meta_data.json
>> +
>> +A bi-directional, pipe-like communication channel will be provided. The 
>> instance
>> +will be able to receive data from the host by a GET request at::
>> +
>> +  http://169.254.169.254/ganeti/<version>/read
>> +
>> +and to send data to the host by a POST request at::
>> +
>> +  http://169.254.169.254/ganeti/<version>/write
>> +
>> +As in a pipe, once the data are read, they will not be in the buffer 
>> anymore, so
>> +subsequent get request to ``read`` will not return the same data twice.
>> +Unlike a pipe, though, it will not be possible to perform blocking I/O
>> +operations.
>> +
>> +The OS parameters will be accessible through a GET
>> +request at::
>> +
>> +  http://169.254.169.254/ganeti/<version>/os/parameters.json
>> +
>> +as a JSON serialized dictionary having the parameter name as the key, and 
>> the
>> +pair ``(<value>, <visibility>)`` as the value, where ``<value>`` is the
>> +user-provided value of the parameter, and ``<visibility>`` is either 
>> ``public``,
>> +``private`` or ``secret``.
>> +
>> +The installation scripts to be run inside the virtualized environment while 
>> the
>> +instance is run in ``install`` mode will be available at::
>> +
>> +  http://169.254.169.254/<version>/ganeti/os/scripts/<script_name>
>> +
>> +where ``<script_name>`` is the name of the script.
>> +
>> +
>> +Rationale
>> +---------
>> +
>> +The choice of using a network interface for instance-host communication, as
>> +opposed to VirtIO, XenBus or other methods, is due to the will of having a
>> +generic, hypervisor-independent way of creating a communication channel, 
>> that
>> +doesn't require unusual (para)virtualization drivers.
>> +At the same time, a network interface was preferred over solutions involving
>> +virtual floppy or USB devices because the latter tend to be detected and
>> +configured by the guest operating systems, sometimes even in prominent 
>> positions
>> +in the user interface, whereas it is fairly common to have an unconfigured
>> +network interface in a system, usually without any negative side effects.
>> +
>> +
>> +Installation process in a virtualized environment
>> ++++++++++++++++++++++++++++++++++++++++++++++++++
>> +
>> +In the new OS installation scenario, we distinguish between trusted and
>> +untrusted code.
>> +
>> +The trusted installation code maintains the behavior of the current one and
>> +requires no modifications, with the scripts running on the node the 
>> instance is
>> +being created on. The untrusted code is stored in a subdirectory of the OS
>> +definition called ``untrusted``.  This directory contains scripts that are
>> +equivalent to the already existing ones (``create``, ``export``, ``import``,
>> +``rename``) but that will be run inside an virtualized environment, to 
>> protect
>> +the host from malicious tampering.
>> +
>> +The ``untrusted`` code is meant to either be untrusted itself, or to be 
>> trusted
>> +code running operations that might be dangerous (such as mounting a
>> +user-provided image).
>> +
>> +By default, all new OS definitions will have to be explicitly marked as 
>> trusted
>> +by the cluster administrator (with a new ``gnt-os modify`` command) before 
>> they
>> +can run code on the host. Otherwise, only the untrusted part of the code 
>> will be
>> +allowed to run, inside the virtual appliance. For backwards compatibility
>> +reasons, when upgrading an existing cluster, all the installed OSes will be
>> +marked as trusted, so that they can keep running with no changes.
>> +
>> +In order to allow for the highest flexibility, if both a trusted and an
>> +untrusted script are provided for the same operation (i.e. ``create``), 
>> both of
>> +them will be executed at the same time, one on the host, and one inside the
>> +installation appliance. They will be allowed to communicate with each other
>> +through the already described communication mechanism, in order to 
>> orchestrate
>> +their execution (e.g.: the untrusted code might execute the installation, 
>> while
>> +the trusted one receives status updates from it and delivers them to a user
>> +interface).
>> +
>> +The cluster administrator will have an option to completely disable scripts
>> +running on the host, leaving only the ones running in the VM.
>> +
>> +Ganeti will provide a script to be run at install time that can be used to
>> +create the virtualized environment that will perform the OS installation of 
>> new
>> +instances.
>> +This script will build a debootstrapped basic debian system including 
>> including
>> +a software that will read the metadata, setup the environment variables and
>> +launch the installation scripts inside the virtualized environment. The 
>> script
>> +will also provide hooks for personalization.
>> +
>> +It will also be possible to use other self-made virtualized environment, as 
>> long
>> +as they connect to ganeti over the described communication mechanism and 
>> they
>> +know how to read and use the provided metadata to create a new instance.
>> +
>> +While performing an installation in the virtualized environment, a
>> +personalizable timeout will be used to detect possible problems with the
>> +installation process, and to kill the virtualized environment. The timeout 
>> will
>> +be optional and set on a cluster basis by the administrator. If set, it 
>> will be
>> +the total time allowed to setup an instance inside the appliance. It is 
>> mainly
>> +meant as a safety measure to prevent an instance taken over by malicious 
>> scripts
>> +to be available for a long time.
>> +
>> +.. vim: set textwidth=72 :
>> +.. Local Variables:
>> +.. mode: rst
>> +.. fill-column: 72
>> +.. End:
>> --
>> 1.8.5.1
>>
>
> --
> Jose Antonio Lopes
> Ganeti Engineering
> Google Germany GmbH
> Dienerstr. 12, 80331, München
>
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg
> Geschäftsführer: Graham Law, Christine Elizabeth Flores
> Steuernummer: 48/725/00206
> Umsatzsteueridentifikationsnummer: DE813741370



-- 
Guido Trotter
Ganeti Engineering
Google Germany GmbH
Dienerstr. 12, 80331, München

Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Graham Law, Christine Elizabeth Flores
Steuernummer: 48/725/00206
Umsatzsteueridentifikationsnummer: DE813741370

Reply via email to