On Fri, Jan 10, 2014 at 05:50:32PM +0200, Vangelis Koukis wrote:
> On Fri, Jan 10, 2014 at 03:02:39pm +0100, Jose A. Lopes wrote:
> > Redesign the communication mechanism in light of implementation
> > limitations that have recently come up in prototypes using KVM.
> >
> > Signed-off-by: Jose A. Lopes <[email protected]>
> 
> Hello Jose,
> 
> a few comments/questions follow inline.
> 
> > ---
> >  doc/design-os.rst | 213 
> > +++++++++++++++++++++++++++++++++++-------------------
> >  1 file changed, 137 insertions(+), 76 deletions(-)
> > 
> > diff --git a/doc/design-os.rst b/doc/design-os.rst
> > index 6281682..5387b9a 100644
> > --- a/doc/design-os.rst
> > +++ b/doc/design-os.rst
> > @@ -75,8 +75,8 @@ In order to fix the shortcomings of the current state, we 
> > plan to introduce the
> >  following changes.
> >  
> >  
> > -OS parameters categories
> > -++++++++++++++++++++++++
> > +OS parameter categories
> > ++++++++++++++++++++++++
> >  
> >  Change the OS parameters to have three categories:
> >  
> > @@ -110,7 +110,7 @@ instance to communicate its progress to the host). Each 
> > instance will have
> >  access exclusively to its own metadata, and it will be only able to 
> > communicate
> >  with its host over this channel.  This is the approach followed the
> >  ``cloud-init`` tool and more details will be provided in the `Communication
> > -mechanism and metadata service`_ section.
> > +mechanism`_ and `Metadata service`_ sections.
> >  
> >  
> >  Installation procedure
> > @@ -242,87 +242,97 @@ Some of these steps need to be more deeply specified 
> > w.r.t. what is already
> >  written in the `Proposed changes`_ Section. Extra details will be provided 
> > in
> >  the following subsections.
> >  
> > -Communication mechanism and metadata service
> > -++++++++++++++++++++++++++++++++++++++++++++
> > +Communication mechanism
> > ++++++++++++++++++++++++
> >  
> > -The communication mechanism and the metadata service are described together
> > -because they are deeply tied. The communication mechanism will be made more
> > -generic because it can be used for other purposes in the future (like 
> > allowing
> > -instances to explicitly send commands to Ganeti, or to let Ganeti control a
> > -helper instance, like the one hereby introduced for performing OS installs
> > -inside a safe environment).
> > +The communication mechanism will be a generic communication channel between
> > +Ganeti and the instances, not only to provide access to the metadata 
> > service,
> > +but also to allow instances to send commands directly to Ganeti or request
> > +changes to parameters, such as, those related to the distribution 
> > upgrades, or
> > +even let Ganeti control a helper instance, such as, the one for performing 
> > OS
> > +installs inside a safe environment, as introduced in this document.
> >  
> >  The communication mechanism will be enabled automatically during an 
> > installation
> > -procedure that requires a virtualized environment, but for backwards
> > -compatibility it will be disabled when the instance is running normally, 
> > unless
> > -it is explicitly requested. Specifically, a new parameter
> > -``--communication=yes|no`` (short version: ``-C``) will be added to
> > -``gnt-instance add`` and ``gnt-instance modify``. It will determine 
> > whether the
> > -instance has a communication channel set to interact with the host and 
> > receive
> > -metadata. The value of this parameter will be saved as part of the 
> > configuration
> > -of the instance.
> > -
> > -When the communication mechanism is enabled, Ganeti will create a new 
> > network
> > -interface inside the instance. This additional network interface will be 
> > the
> > -last one in the instance, after all the user defined ones. On the host 
> > side,
> > -this interface will only be accessible to the host itself, and not routed
> > -outside the machine.
> > -On this network interface, the instance will connect using the IP:
> > -169.254.169.253 and netmask 255.255.255.0.
> > -The host will be on the same network, with the IP address: 169.254.169.254.
> > -
> > -The way to create this interface depends on the specific hypervisor being 
> > used.
> > -In KVM, it is possible to create a network interface inside the instance 
> > without
> > -having a corresponding interface created on the host. Using a command 
> > like::
> > -
> > -  kvm -net nic -net \
> > -    user,restrict=on,net=169.254.169.0/24,host=169.254.169.253,
> > -    guestfwd=tcp:169.254.169.254:80-tcp:127.0.0.1:8080
> > -
> > -a network interface will be created inside the VM, part of the 
> > 169.254.169.0/24
> > -network, where the VM will have IP address .253 and the host port 8080 
> > will be
> > -reachable on port 80.
> > -
> > -In Xen, unfortunately, such a capability is not present, and an actual 
> > network
> > -interface has to be created on the host (using the ``vif`` parameter of 
> > the Xen
> > -configuration file). Each instance will have its corresponding ``vif`` 
> > network
> > -interface on the host. These interfaces will not be connected to each 
> > other in
> > -any way, and Ganeti will not configure them to allow traffic to be 
> > forwarded
> > -beyond the host machine. The ``vif-route`` script of Xen might be helpful 
> > in
> > -implementing this.
> > -It will be the system administrator's responsibility to ensure that the 
> > extra
> > -firewalling and routing rules specified on the host don't allow this
> > -accidentally.
> > -
> > -The instance will be able to connect to 169.254.169.254:80, and issue GET
> > -requests to an HTTP server that will provide the instance metadata.
> > -
> > -The choice of this IP address and port for accessing the metadata is done 
> > for
> > -compatibility reasons with OpenStack's and Amazon EC2's ways of providing
> > -metadata to the instance. The metadata will be provided by a single daemon,
> > -which will determine what instance the request comes from and reply with 
> > the
> > -metadata specific for that instance.
> > +procedure that requires a virtualized environment, but, for backwards
> > +compatibility, it will be disabled when the instance is running normally, 
> > unless
> > +explicitly requested.  Specifically, a new parameter 
> > ``--communication=yes|no``
> > +(short version: ``-C``) will be added to ``gnt-instance add`` and 
> > ``gnt-instance
> > +modify``.  This parameter will determine whether the communication 
> > mechanism is
> > +enabled for a particular instance.  The value of this parameter will be 
> > saved as
> > +part of the instance's configuration.
> > +
> > +The communication mechanism will be implemented through network interfaces 
> > on
> > +the host and the guest.  The host will create a TAP network interface for 
> > each
> > +guest.  This network interface will be connected to the guest's last 
> > network
> > +interface, which is meant to be used exclusively for the communication 
> > mechanism
> > +and is defined after all the used-defined interfaces.  Moreover, the 
> > network
> > +interfaces provide a communication channel that is solely used by the host 
> > and
> > +each guest, therefore, a guest cannot use this network interface to reach 
> > the
> > +outside world or other guests.  It is the system administrator's 
> > responsibility
> > +to ensure that the extra firewalling and routing rules specified on the 
> > host do
> > +not override this behaviour accidentally.
> > +
> > +On the host side, these TAP network interfaces will have IP address
> > +``169.254.169.254`` in the network ``169.254.0.0/16`` (i.e., netmask
> > +``255.255.0.0``).  On the guest side, each instance will have its own MAC
> > +address and an IP address in the network ``169.254.0.0/16``.  The MAC 
> > address
> > +and the IP address must be unique within a single host. The guest will use 
> > the
> 
> It's not very clear to me, who will be responsible for setting up
> the host side of the TAP interfaces. Who will be responsible for
> assigning the IP address 169.254.169.254 on all TAP intefaces of the
> host, and what will the routing rules be? To clarify, say I have 3 VMs,
> on tap0, tap1 and tap2, with IPs 169.254.0.1, 169.254.0.2, 169.254.0.3
> respectively.
> 
> If the host has IP 169.254.169.254 on all interfaces, with the same /16
> netmask, how will it be able to pick the right interface when sending an
> IP packet to VM1 vs. when sending to VM3?
> 
> I think this could work with explicit routes: One to 169.254.0.1/32
> through tap0, one to 169.254.0.2/32 through tap1, and one through
> 169.254.0.2/32 through tap2. If yes, will Ganeti set up these routes
> explicitly?
> 
> In a similar note, who will be responsible for setting up the DHCP
> server? It could be the administrator's responsibility, but then if it
> is Ganeti the entity which picks the MAC addresses and IPs for the guest
> side of the TAP interfaces, how will this DHCP server be notified, so as
> to only server the correct IP addresses to specific MAC addresses?
> 
> Also, if it is the administrator's responsibility, then perhaps the
> admin should be able to set up standard ifup hooks, like for every
> other interface of an instance. But in the following examples, you
> specifically set script=no,downscript=no.
> 
> Another possibility would be for Ganeti to come prepackaged with its
> own, embedded DHCP server just for serving requests on the TAPs used for
> the communication mechanism. We've been using snf-nfdhcpd
> (https://code.grnet.gr/projects/snf-nfdhcpd) for just that in
> production.

In snf-nfdhcpd, how do you configure the interfaces to listen to and
the pairs (MAC, IP)?  I couldn't find any documentation and the
configuration file does not seem to contain these.

> Actually, in previous conversation Guido had asked us to document how to
> set it up with Ganeti, and merge the resulting docs with the Ganeti
> upstream. Perhaps it would make sense to combine the effort now, and use
> snf-nfdhcpd as an embedded DHCP server with Ganeti. Sorry for not having
> documented it earlier.
> 
> > +DHCP protocol on its last network interface to contact a DHCP server 
> > running on
> > +the host and thus determine its IP address.  The DHCP server will be 
> > listening
> > +exclusively on the TAP network interfaces of the guests.  Therefore, it 
> > will not
> > +interfere with a potential DHCP server running on the same host.  
> > Furthermore,
> > +the DHCP server will only recognize MAC and IP address pairs that have been
> > +approved by Ganeti.
> > +
> > +The TAP network interfaces created for each guest all share the same IP 
> > address.
> > +Therefore, it will be necessary to extend the routing table with rules 
> > specific
> > +to each guest.  This can be achieved with the following command, which 
> > takes the
> > +guest's unique IP address and its TAP interface::
> > +
> > +  route add -host <ip> dev <ifname>
> > +
> > +For KVM, an instance will be started with a unique MAC address and the TAP
> > +network interface name meant to be used by the communication mechanism.  
> > KVM
> > +creates the actual interface::
> > +
> > +  kvm -net nic,macaddr=<mac> -net 
> > tap,ifname=<ifname>,script=no,downscript=no ...
> > +
> 
> If I understand correctly, in previous versions of Ganeti it used to be
> the case that KVM opened the actual TAP interface, upon initialization
> of the KVM process. This was changed however (see commit 5d9bfd870a) so
> that Ganeti itself created the TAP interface, then passed it as an open
> file descriptor to the KVM process. Is there any reason to deviate from
> this, and make handling the TAP interface for the communication
> mechanism a special case?
> 
> Also, the same question applies as above. If setting up the DHCP server 
> is the responsibility of the administrator, then perhaps Ganeti should
> support running ifup hooks for the TAPs. Or, Ganeti could come with its
> own embedded DHCP server and handle everything by itself, without
> messing with an already existing DHCP server.
> 
> Thanks,
> Vangelis.
> 
> > +For Xen, a network interface will be created on the host (using the ``vif``
> > +parameter of the Xen configuration file).  Each instance will have its
> > +corresponding ``vif`` network interface on the host.  The ``vif-route`` 
> > script
> > +of Xen might be helpful in implementing this.
> > +
> > +
> > +Metadata service
> > +++++++++++++++++
> > +
> > +An instance will be able to reach metadata service on 
> > ``169.254.169.254:80`` in
> > +order to, for example, retrieve its metadata.  This IP address and port 
> > were
> > +chosen for compatibility with the OpenStack and Amazon EC2 metadata 
> > service.
> > +The metadata service will be provided by a single daemon, which will 
> > determine
> > +the source instance for a given request and reply with the metadata 
> > pertaining
> > +to that instance.
> >  
> >  Where possible, the metadata will be provided in a way compatible with 
> > Amazon
> >  EC2, at::
> >  
> >    http://169.254.169.254/<version>/meta-data/*
> >  
> > -If some metadata are Ganeti-specific and don't fit this structure, they 
> > will be
> > -provided at::
> > +Ganeti-specific metadata, that does not fit this structure, will be 
> > provided
> > +at::
> >  
> >    http://169.254.169.254/ganeti/<version>/meta_data.json
> >  
> > -``<version>`` is either a date in YYYY-MM-DD format, or ``latest`` to 
> > indicate
> > -the most recent available protocol version.
> > +where ``<version>`` is either a date in YYYY-MM-DD format, or ``latest`` to
> > +indicate the most recent available protocol version.
> >  
> >  If needed in the future, this structure also allows us to support 
> > OpenStack's
> >  metadata at::
> >  
> >    http://169.254.169.254/openstack/<version>/meta_data.json
> >  
> > -A bi-directional, pipe-like communication channel will be provided. The 
> > instance
> > -will be able to receive data from the host by a GET request at::
> > +A bi-directional, pipe-like communication channel will also be provided.  
> > The
> > +instance will be able to receive data from the host by a GET request at::
> >  
> >    http://169.254.169.254/ganeti/<version>/read
> >  
> > @@ -331,12 +341,10 @@ and to send data to the host by a POST request at::
> >    http://169.254.169.254/ganeti/<version>/write
> >  
> >  As in a pipe, once the data are read, they will not be in the buffer 
> > anymore, so
> > -subsequent GET requests to ``read`` will not return the same data twice.
> > -Unlike a pipe, though, it will not be possible to perform blocking I/O
> > -operations.
> > +subsequent GET requests to ``read`` will not return the same data.  
> > However,
> > +unlike a pipe, it will not be possible to perform blocking I/O operations.
> >  
> > -The OS parameters will be accessible through a GET
> > -request at::
> > +The OS parameters will be accessible through a GET request at::
> >  
> >    http://169.254.169.254/ganeti/<version>/os/parameters.json
> >  
> > @@ -424,8 +432,61 @@ the total time allowed to setup an instance inside the 
> > appliance. It is mainly
> >  meant as a safety measure to prevent an instance taken over by malicious 
> > scripts
> >  to be available for a long time.
> >  
> > -.. vim: set textwidth=72 :
> > -.. Local Variables:
> > -.. mode: rst
> > -.. fill-column: 72
> > -.. End:
> > +
> > +Port forwarding in KVM
> > +++++++++++++++++++++++
> > +
> > +The communication mechanism could have been implemented in KVM using guest 
> > port
> > +forwarding, as opposed to network interfaces.  There are two alternatives 
> > in
> > +KVM's guest port forwarding, namely, creating a forwarding device, such 
> > as, a
> > +TCP/IP connection, or executing a command.  However, we have determined 
> > that
> > +both of these options are not viable.
> > +
> > +A TCP/IP forwarding device can be created through the following KVM 
> > invocation::
> > +
> > +  kvm -net nic -net \
> > +    user,restrict=on,net=169.254.169.0/24,host=169.254.169.253,
> > +    guestfwd=tcp:169.254.169.254:80-tcp:127.0.0.1:8080 ...
> > +
> > +This invocation even has advantage that it can remap ports, which would 
> > have
> > +allowed the metadata service daemon to run in port 8080 instead of 80.  
> > However,
> > +in this scheme, KVM opens the TCP connection only once, when it is 
> > started, and,
> > +if the connection breaks, KVM will not reconnect.  Furthermore, this also
> > +interferes with the HTTP protocol, which needs to dynamically establish and
> > +close connections.
> > +
> > +The alternative to opening a single TCP/IP connection is to execute a 
> > command.
> > +The KVM invocation for this is, for example, the following::
> > +
> > +  kvm -net nic -net \
> > +    "user,restrict=on,net=169.254.169.0/24,host=169.254.169.253,
> > +    guestfwd=tcp:169.254.169.254:80-netcat 127.0.0.1 8080" ...
> > +
> > +The advantage of this approach is that the command is executed each time 
> > the
> > +guest initiates a connection.  This is the ideal situation, however, it is 
> > only
> > +supported in KVM 1.2 and above, and, therefore, not viable because we want 
> > to
> > +provide support for at least KVM version 1.0, which is the version 
> > provided by
> > +Ubuntu LTS.
> > +
> > +
> > +Alternatives to the DHCP server
> > ++++++++++++++++++++++++++++++++
> > +
> > +There are alternatives to using the DHCP server, for example, by assigning
> > +identical IP addresses to guests, such as, the IP address 
> > ``169.254.169.253``.
> > +However, this introduces a routing problem, namely, how to route incoming
> > +packets from the same source IP to the host.  This problem can be overcome 
> > in a
> > +number of ways.
> > +
> > +The first solution is to use NAT to translate the incoming guest IP 
> > address, for
> > +example, ``169.254.169.253``, to an IP address unique within a single 
> > host, for
> > +example, ``169.254.0.1``.  Given that NAT through ``ip rule`` is 
> > deprecated,
> > +users can resort to ``iptables``.  Note that this has not yet been tested.
> > +
> > +Another option, which has indeed been tested in a prototype, is to connect 
> > the
> > +TAP network interfaces of the guests to a bridge.  The bridge takes the
> > +configuration for the TAP network interfaces, namely, IP address
> > +``169.254.169.254`` and netmask ``255.255.0.0``, thus leaving those 
> > interfaces
> > +without an IP address.  Note that in this setting, guests will be able to 
> > reach
> > +each other, therefore, if necessary, additional ``iptables`` rules can be 
> > put in
> > +place to prevent it.
> > -- 
> > 1.8.5.1
> 
> -- 
> Vangelis Koukis
> [email protected]
> OpenPGP public key ID:
> pub  1024D/1D038E97 2003-07-13 Vangelis Koukis <[email protected]>
>      Key fingerprint = C5CD E02E 2C78 7C10 8A00  53D8 FBFC 3799 1D03 8E97
> 
> Only those who will risk going too far
> can possibly find out how far one can go.
>         -- T.S. Eliot



-- 
Jose Antonio Lopes
Ganeti Engineering
Google Germany GmbH
Dienerstr. 12, 80331, München

Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Graham Law, Christine Elizabeth Flores
Steuernummer: 48/725/00206
Umsatzsteueridentifikationsnummer: DE813741370

Reply via email to