On Fri, Jan 10, 2014 at 03:02:39pm +0100, Jose A. Lopes wrote: > Redesign the communication mechanism in light of implementation > limitations that have recently come up in prototypes using KVM. > > Signed-off-by: Jose A. Lopes <[email protected]>
Hello Jose, a few comments/questions follow inline. > --- > doc/design-os.rst | 213 > +++++++++++++++++++++++++++++++++++------------------- > 1 file changed, 137 insertions(+), 76 deletions(-) > > diff --git a/doc/design-os.rst b/doc/design-os.rst > index 6281682..5387b9a 100644 > --- a/doc/design-os.rst > +++ b/doc/design-os.rst > @@ -75,8 +75,8 @@ In order to fix the shortcomings of the current state, we > plan to introduce the > following changes. > > > -OS parameters categories > -++++++++++++++++++++++++ > +OS parameter categories > ++++++++++++++++++++++++ > > Change the OS parameters to have three categories: > > @@ -110,7 +110,7 @@ instance to communicate its progress to the host). Each > instance will have > access exclusively to its own metadata, and it will be only able to > communicate > with its host over this channel. This is the approach followed the > ``cloud-init`` tool and more details will be provided in the `Communication > -mechanism and metadata service`_ section. > +mechanism`_ and `Metadata service`_ sections. > > > Installation procedure > @@ -242,87 +242,97 @@ Some of these steps need to be more deeply specified > w.r.t. what is already > written in the `Proposed changes`_ Section. Extra details will be provided in > the following subsections. > > -Communication mechanism and metadata service > -++++++++++++++++++++++++++++++++++++++++++++ > +Communication mechanism > ++++++++++++++++++++++++ > > -The communication mechanism and the metadata service are described together > -because they are deeply tied. The communication mechanism will be made more > -generic because it can be used for other purposes in the future (like > allowing > -instances to explicitly send commands to Ganeti, or to let Ganeti control a > -helper instance, like the one hereby introduced for performing OS installs > -inside a safe environment). > +The communication mechanism will be a generic communication channel between > +Ganeti and the instances, not only to provide access to the metadata service, > +but also to allow instances to send commands directly to Ganeti or request > +changes to parameters, such as, those related to the distribution upgrades, > or > +even let Ganeti control a helper instance, such as, the one for performing OS > +installs inside a safe environment, as introduced in this document. > > The communication mechanism will be enabled automatically during an > installation > -procedure that requires a virtualized environment, but for backwards > -compatibility it will be disabled when the instance is running normally, > unless > -it is explicitly requested. Specifically, a new parameter > -``--communication=yes|no`` (short version: ``-C``) will be added to > -``gnt-instance add`` and ``gnt-instance modify``. It will determine whether > the > -instance has a communication channel set to interact with the host and > receive > -metadata. The value of this parameter will be saved as part of the > configuration > -of the instance. > - > -When the communication mechanism is enabled, Ganeti will create a new network > -interface inside the instance. This additional network interface will be the > -last one in the instance, after all the user defined ones. On the host side, > -this interface will only be accessible to the host itself, and not routed > -outside the machine. > -On this network interface, the instance will connect using the IP: > -169.254.169.253 and netmask 255.255.255.0. > -The host will be on the same network, with the IP address: 169.254.169.254. > - > -The way to create this interface depends on the specific hypervisor being > used. > -In KVM, it is possible to create a network interface inside the instance > without > -having a corresponding interface created on the host. Using a command like:: > - > - kvm -net nic -net \ > - user,restrict=on,net=169.254.169.0/24,host=169.254.169.253, > - guestfwd=tcp:169.254.169.254:80-tcp:127.0.0.1:8080 > - > -a network interface will be created inside the VM, part of the > 169.254.169.0/24 > -network, where the VM will have IP address .253 and the host port 8080 will > be > -reachable on port 80. > - > -In Xen, unfortunately, such a capability is not present, and an actual > network > -interface has to be created on the host (using the ``vif`` parameter of the > Xen > -configuration file). Each instance will have its corresponding ``vif`` > network > -interface on the host. These interfaces will not be connected to each other > in > -any way, and Ganeti will not configure them to allow traffic to be forwarded > -beyond the host machine. The ``vif-route`` script of Xen might be helpful in > -implementing this. > -It will be the system administrator's responsibility to ensure that the extra > -firewalling and routing rules specified on the host don't allow this > -accidentally. > - > -The instance will be able to connect to 169.254.169.254:80, and issue GET > -requests to an HTTP server that will provide the instance metadata. > - > -The choice of this IP address and port for accessing the metadata is done for > -compatibility reasons with OpenStack's and Amazon EC2's ways of providing > -metadata to the instance. The metadata will be provided by a single daemon, > -which will determine what instance the request comes from and reply with the > -metadata specific for that instance. > +procedure that requires a virtualized environment, but, for backwards > +compatibility, it will be disabled when the instance is running normally, > unless > +explicitly requested. Specifically, a new parameter > ``--communication=yes|no`` > +(short version: ``-C``) will be added to ``gnt-instance add`` and > ``gnt-instance > +modify``. This parameter will determine whether the communication mechanism > is > +enabled for a particular instance. The value of this parameter will be > saved as > +part of the instance's configuration. > + > +The communication mechanism will be implemented through network interfaces on > +the host and the guest. The host will create a TAP network interface for > each > +guest. This network interface will be connected to the guest's last network > +interface, which is meant to be used exclusively for the communication > mechanism > +and is defined after all the used-defined interfaces. Moreover, the network > +interfaces provide a communication channel that is solely used by the host > and > +each guest, therefore, a guest cannot use this network interface to reach the > +outside world or other guests. It is the system administrator's > responsibility > +to ensure that the extra firewalling and routing rules specified on the host > do > +not override this behaviour accidentally. > + > +On the host side, these TAP network interfaces will have IP address > +``169.254.169.254`` in the network ``169.254.0.0/16`` (i.e., netmask > +``255.255.0.0``). On the guest side, each instance will have its own MAC > +address and an IP address in the network ``169.254.0.0/16``. The MAC address > +and the IP address must be unique within a single host. The guest will use > the It's not very clear to me, who will be responsible for setting up the host side of the TAP interfaces. Who will be responsible for assigning the IP address 169.254.169.254 on all TAP intefaces of the host, and what will the routing rules be? To clarify, say I have 3 VMs, on tap0, tap1 and tap2, with IPs 169.254.0.1, 169.254.0.2, 169.254.0.3 respectively. If the host has IP 169.254.169.254 on all interfaces, with the same /16 netmask, how will it be able to pick the right interface when sending an IP packet to VM1 vs. when sending to VM3? I think this could work with explicit routes: One to 169.254.0.1/32 through tap0, one to 169.254.0.2/32 through tap1, and one through 169.254.0.2/32 through tap2. If yes, will Ganeti set up these routes explicitly? In a similar note, who will be responsible for setting up the DHCP server? It could be the administrator's responsibility, but then if it is Ganeti the entity which picks the MAC addresses and IPs for the guest side of the TAP interfaces, how will this DHCP server be notified, so as to only server the correct IP addresses to specific MAC addresses? Also, if it is the administrator's responsibility, then perhaps the admin should be able to set up standard ifup hooks, like for every other interface of an instance. But in the following examples, you specifically set script=no,downscript=no. Another possibility would be for Ganeti to come prepackaged with its own, embedded DHCP server just for serving requests on the TAPs used for the communication mechanism. We've been using snf-nfdhcpd (https://code.grnet.gr/projects/snf-nfdhcpd) for just that in production. Actually, in previous conversation Guido had asked us to document how to set it up with Ganeti, and merge the resulting docs with the Ganeti upstream. Perhaps it would make sense to combine the effort now, and use snf-nfdhcpd as an embedded DHCP server with Ganeti. Sorry for not having documented it earlier. > +DHCP protocol on its last network interface to contact a DHCP server running > on > +the host and thus determine its IP address. The DHCP server will be > listening > +exclusively on the TAP network interfaces of the guests. Therefore, it will > not > +interfere with a potential DHCP server running on the same host. > Furthermore, > +the DHCP server will only recognize MAC and IP address pairs that have been > +approved by Ganeti. > + > +The TAP network interfaces created for each guest all share the same IP > address. > +Therefore, it will be necessary to extend the routing table with rules > specific > +to each guest. This can be achieved with the following command, which takes > the > +guest's unique IP address and its TAP interface:: > + > + route add -host <ip> dev <ifname> > + > +For KVM, an instance will be started with a unique MAC address and the TAP > +network interface name meant to be used by the communication mechanism. KVM > +creates the actual interface:: > + > + kvm -net nic,macaddr=<mac> -net > tap,ifname=<ifname>,script=no,downscript=no ... > + If I understand correctly, in previous versions of Ganeti it used to be the case that KVM opened the actual TAP interface, upon initialization of the KVM process. This was changed however (see commit 5d9bfd870a) so that Ganeti itself created the TAP interface, then passed it as an open file descriptor to the KVM process. Is there any reason to deviate from this, and make handling the TAP interface for the communication mechanism a special case? Also, the same question applies as above. If setting up the DHCP server is the responsibility of the administrator, then perhaps Ganeti should support running ifup hooks for the TAPs. Or, Ganeti could come with its own embedded DHCP server and handle everything by itself, without messing with an already existing DHCP server. Thanks, Vangelis. > +For Xen, a network interface will be created on the host (using the ``vif`` > +parameter of the Xen configuration file). Each instance will have its > +corresponding ``vif`` network interface on the host. The ``vif-route`` > script > +of Xen might be helpful in implementing this. > + > + > +Metadata service > +++++++++++++++++ > + > +An instance will be able to reach metadata service on ``169.254.169.254:80`` > in > +order to, for example, retrieve its metadata. This IP address and port were > +chosen for compatibility with the OpenStack and Amazon EC2 metadata service. > +The metadata service will be provided by a single daemon, which will > determine > +the source instance for a given request and reply with the metadata > pertaining > +to that instance. > > Where possible, the metadata will be provided in a way compatible with Amazon > EC2, at:: > > http://169.254.169.254/<version>/meta-data/* > > -If some metadata are Ganeti-specific and don't fit this structure, they will > be > -provided at:: > +Ganeti-specific metadata, that does not fit this structure, will be provided > +at:: > > http://169.254.169.254/ganeti/<version>/meta_data.json > > -``<version>`` is either a date in YYYY-MM-DD format, or ``latest`` to > indicate > -the most recent available protocol version. > +where ``<version>`` is either a date in YYYY-MM-DD format, or ``latest`` to > +indicate the most recent available protocol version. > > If needed in the future, this structure also allows us to support OpenStack's > metadata at:: > > http://169.254.169.254/openstack/<version>/meta_data.json > > -A bi-directional, pipe-like communication channel will be provided. The > instance > -will be able to receive data from the host by a GET request at:: > +A bi-directional, pipe-like communication channel will also be provided. The > +instance will be able to receive data from the host by a GET request at:: > > http://169.254.169.254/ganeti/<version>/read > > @@ -331,12 +341,10 @@ and to send data to the host by a POST request at:: > http://169.254.169.254/ganeti/<version>/write > > As in a pipe, once the data are read, they will not be in the buffer > anymore, so > -subsequent GET requests to ``read`` will not return the same data twice. > -Unlike a pipe, though, it will not be possible to perform blocking I/O > -operations. > +subsequent GET requests to ``read`` will not return the same data. However, > +unlike a pipe, it will not be possible to perform blocking I/O operations. > > -The OS parameters will be accessible through a GET > -request at:: > +The OS parameters will be accessible through a GET request at:: > > http://169.254.169.254/ganeti/<version>/os/parameters.json > > @@ -424,8 +432,61 @@ the total time allowed to setup an instance inside the > appliance. It is mainly > meant as a safety measure to prevent an instance taken over by malicious > scripts > to be available for a long time. > > -.. vim: set textwidth=72 : > -.. Local Variables: > -.. mode: rst > -.. fill-column: 72 > -.. End: > + > +Port forwarding in KVM > +++++++++++++++++++++++ > + > +The communication mechanism could have been implemented in KVM using guest > port > +forwarding, as opposed to network interfaces. There are two alternatives in > +KVM's guest port forwarding, namely, creating a forwarding device, such as, a > +TCP/IP connection, or executing a command. However, we have determined that > +both of these options are not viable. > + > +A TCP/IP forwarding device can be created through the following KVM > invocation:: > + > + kvm -net nic -net \ > + user,restrict=on,net=169.254.169.0/24,host=169.254.169.253, > + guestfwd=tcp:169.254.169.254:80-tcp:127.0.0.1:8080 ... > + > +This invocation even has advantage that it can remap ports, which would have > +allowed the metadata service daemon to run in port 8080 instead of 80. > However, > +in this scheme, KVM opens the TCP connection only once, when it is started, > and, > +if the connection breaks, KVM will not reconnect. Furthermore, this also > +interferes with the HTTP protocol, which needs to dynamically establish and > +close connections. > + > +The alternative to opening a single TCP/IP connection is to execute a > command. > +The KVM invocation for this is, for example, the following:: > + > + kvm -net nic -net \ > + "user,restrict=on,net=169.254.169.0/24,host=169.254.169.253, > + guestfwd=tcp:169.254.169.254:80-netcat 127.0.0.1 8080" ... > + > +The advantage of this approach is that the command is executed each time the > +guest initiates a connection. This is the ideal situation, however, it is > only > +supported in KVM 1.2 and above, and, therefore, not viable because we want to > +provide support for at least KVM version 1.0, which is the version provided > by > +Ubuntu LTS. > + > + > +Alternatives to the DHCP server > ++++++++++++++++++++++++++++++++ > + > +There are alternatives to using the DHCP server, for example, by assigning > +identical IP addresses to guests, such as, the IP address > ``169.254.169.253``. > +However, this introduces a routing problem, namely, how to route incoming > +packets from the same source IP to the host. This problem can be overcome > in a > +number of ways. > + > +The first solution is to use NAT to translate the incoming guest IP address, > for > +example, ``169.254.169.253``, to an IP address unique within a single host, > for > +example, ``169.254.0.1``. Given that NAT through ``ip rule`` is deprecated, > +users can resort to ``iptables``. Note that this has not yet been tested. > + > +Another option, which has indeed been tested in a prototype, is to connect > the > +TAP network interfaces of the guests to a bridge. The bridge takes the > +configuration for the TAP network interfaces, namely, IP address > +``169.254.169.254`` and netmask ``255.255.0.0``, thus leaving those > interfaces > +without an IP address. Note that in this setting, guests will be able to > reach > +each other, therefore, if necessary, additional ``iptables`` rules can be > put in > +place to prevent it. > -- > 1.8.5.1 -- Vangelis Koukis [email protected] OpenPGP public key ID: pub 1024D/1D038E97 2003-07-13 Vangelis Koukis <[email protected]> Key fingerprint = C5CD E02E 2C78 7C10 8A00 53D8 FBFC 3799 1D03 8E97 Only those who will risk going too far can possibly find out how far one can go. -- T.S. Eliot
signature.asc
Description: Digital signature
