On Fri, Jan 10, 2014 at 05:50:32PM +0200, Vangelis Koukis wrote: > On Fri, Jan 10, 2014 at 03:02:39pm +0100, Jose A. Lopes wrote: > > Redesign the communication mechanism in light of implementation > > limitations that have recently come up in prototypes using KVM. > > > > Signed-off-by: Jose A. Lopes <[email protected]> > > Hello Jose, > > a few comments/questions follow inline. > > > --- > > doc/design-os.rst | 213 > > +++++++++++++++++++++++++++++++++++------------------- > > 1 file changed, 137 insertions(+), 76 deletions(-) > > > > diff --git a/doc/design-os.rst b/doc/design-os.rst > > index 6281682..5387b9a 100644 > > --- a/doc/design-os.rst > > +++ b/doc/design-os.rst > > @@ -75,8 +75,8 @@ In order to fix the shortcomings of the current state, we > > plan to introduce the > > following changes. > > > > > > -OS parameters categories > > -++++++++++++++++++++++++ > > +OS parameter categories > > ++++++++++++++++++++++++ > > > > Change the OS parameters to have three categories: > > > > @@ -110,7 +110,7 @@ instance to communicate its progress to the host). Each > > instance will have > > access exclusively to its own metadata, and it will be only able to > > communicate > > with its host over this channel. This is the approach followed the > > ``cloud-init`` tool and more details will be provided in the `Communication > > -mechanism and metadata service`_ section. > > +mechanism`_ and `Metadata service`_ sections. > > > > > > Installation procedure > > @@ -242,87 +242,97 @@ Some of these steps need to be more deeply specified > > w.r.t. what is already > > written in the `Proposed changes`_ Section. Extra details will be provided > > in > > the following subsections. > > > > -Communication mechanism and metadata service > > -++++++++++++++++++++++++++++++++++++++++++++ > > +Communication mechanism > > ++++++++++++++++++++++++ > > > > -The communication mechanism and the metadata service are described together > > -because they are deeply tied. The communication mechanism will be made more > > -generic because it can be used for other purposes in the future (like > > allowing > > -instances to explicitly send commands to Ganeti, or to let Ganeti control a > > -helper instance, like the one hereby introduced for performing OS installs > > -inside a safe environment). > > +The communication mechanism will be a generic communication channel between > > +Ganeti and the instances, not only to provide access to the metadata > > service, > > +but also to allow instances to send commands directly to Ganeti or request > > +changes to parameters, such as, those related to the distribution > > upgrades, or > > +even let Ganeti control a helper instance, such as, the one for performing > > OS > > +installs inside a safe environment, as introduced in this document. > > > > The communication mechanism will be enabled automatically during an > > installation > > -procedure that requires a virtualized environment, but for backwards > > -compatibility it will be disabled when the instance is running normally, > > unless > > -it is explicitly requested. Specifically, a new parameter > > -``--communication=yes|no`` (short version: ``-C``) will be added to > > -``gnt-instance add`` and ``gnt-instance modify``. It will determine > > whether the > > -instance has a communication channel set to interact with the host and > > receive > > -metadata. The value of this parameter will be saved as part of the > > configuration > > -of the instance. > > - > > -When the communication mechanism is enabled, Ganeti will create a new > > network > > -interface inside the instance. This additional network interface will be > > the > > -last one in the instance, after all the user defined ones. On the host > > side, > > -this interface will only be accessible to the host itself, and not routed > > -outside the machine. > > -On this network interface, the instance will connect using the IP: > > -169.254.169.253 and netmask 255.255.255.0. > > -The host will be on the same network, with the IP address: 169.254.169.254. > > - > > -The way to create this interface depends on the specific hypervisor being > > used. > > -In KVM, it is possible to create a network interface inside the instance > > without > > -having a corresponding interface created on the host. Using a command > > like:: > > - > > - kvm -net nic -net \ > > - user,restrict=on,net=169.254.169.0/24,host=169.254.169.253, > > - guestfwd=tcp:169.254.169.254:80-tcp:127.0.0.1:8080 > > - > > -a network interface will be created inside the VM, part of the > > 169.254.169.0/24 > > -network, where the VM will have IP address .253 and the host port 8080 > > will be > > -reachable on port 80. > > - > > -In Xen, unfortunately, such a capability is not present, and an actual > > network > > -interface has to be created on the host (using the ``vif`` parameter of > > the Xen > > -configuration file). Each instance will have its corresponding ``vif`` > > network > > -interface on the host. These interfaces will not be connected to each > > other in > > -any way, and Ganeti will not configure them to allow traffic to be > > forwarded > > -beyond the host machine. The ``vif-route`` script of Xen might be helpful > > in > > -implementing this. > > -It will be the system administrator's responsibility to ensure that the > > extra > > -firewalling and routing rules specified on the host don't allow this > > -accidentally. > > - > > -The instance will be able to connect to 169.254.169.254:80, and issue GET > > -requests to an HTTP server that will provide the instance metadata. > > - > > -The choice of this IP address and port for accessing the metadata is done > > for > > -compatibility reasons with OpenStack's and Amazon EC2's ways of providing > > -metadata to the instance. The metadata will be provided by a single daemon, > > -which will determine what instance the request comes from and reply with > > the > > -metadata specific for that instance. > > +procedure that requires a virtualized environment, but, for backwards > > +compatibility, it will be disabled when the instance is running normally, > > unless > > +explicitly requested. Specifically, a new parameter > > ``--communication=yes|no`` > > +(short version: ``-C``) will be added to ``gnt-instance add`` and > > ``gnt-instance > > +modify``. This parameter will determine whether the communication > > mechanism is > > +enabled for a particular instance. The value of this parameter will be > > saved as > > +part of the instance's configuration. > > + > > +The communication mechanism will be implemented through network interfaces > > on > > +the host and the guest. The host will create a TAP network interface for > > each > > +guest. This network interface will be connected to the guest's last > > network > > +interface, which is meant to be used exclusively for the communication > > mechanism > > +and is defined after all the used-defined interfaces. Moreover, the > > network > > +interfaces provide a communication channel that is solely used by the host > > and > > +each guest, therefore, a guest cannot use this network interface to reach > > the > > +outside world or other guests. It is the system administrator's > > responsibility > > +to ensure that the extra firewalling and routing rules specified on the > > host do > > +not override this behaviour accidentally. > > + > > +On the host side, these TAP network interfaces will have IP address > > +``169.254.169.254`` in the network ``169.254.0.0/16`` (i.e., netmask > > +``255.255.0.0``). On the guest side, each instance will have its own MAC > > +address and an IP address in the network ``169.254.0.0/16``. The MAC > > address > > +and the IP address must be unique within a single host. The guest will use > > the > > It's not very clear to me, who will be responsible for setting up > the host side of the TAP interfaces. Who will be responsible for > assigning the IP address 169.254.169.254 on all TAP intefaces of the > host, and what will the routing rules be? To clarify, say I have 3 VMs, > on tap0, tap1 and tap2, with IPs 169.254.0.1, 169.254.0.2, 169.254.0.3 > respectively. > > If the host has IP 169.254.169.254 on all interfaces, with the same /16 > netmask, how will it be able to pick the right interface when sending an > IP packet to VM1 vs. when sending to VM3? > > I think this could work with explicit routes: One to 169.254.0.1/32 > through tap0, one to 169.254.0.2/32 through tap1, and one through > 169.254.0.2/32 through tap2. If yes, will Ganeti set up these routes > explicitly? > > In a similar note, who will be responsible for setting up the DHCP > server? It could be the administrator's responsibility, but then if it > is Ganeti the entity which picks the MAC addresses and IPs for the guest > side of the TAP interfaces, how will this DHCP server be notified, so as > to only server the correct IP addresses to specific MAC addresses? > > Also, if it is the administrator's responsibility, then perhaps the > admin should be able to set up standard ifup hooks, like for every > other interface of an instance. But in the following examples, you > specifically set script=no,downscript=no. > > Another possibility would be for Ganeti to come prepackaged with its > own, embedded DHCP server just for serving requests on the TAPs used for > the communication mechanism. We've been using snf-nfdhcpd > (https://code.grnet.gr/projects/snf-nfdhcpd) for just that in > production.
In snf-nfdhcpd, how do you configure the interfaces to listen to and the pairs (MAC, IP)? I couldn't find any documentation and the configuration file does not seem to contain these. > Actually, in previous conversation Guido had asked us to document how to > set it up with Ganeti, and merge the resulting docs with the Ganeti > upstream. Perhaps it would make sense to combine the effort now, and use > snf-nfdhcpd as an embedded DHCP server with Ganeti. Sorry for not having > documented it earlier. > > > +DHCP protocol on its last network interface to contact a DHCP server > > running on > > +the host and thus determine its IP address. The DHCP server will be > > listening > > +exclusively on the TAP network interfaces of the guests. Therefore, it > > will not > > +interfere with a potential DHCP server running on the same host. > > Furthermore, > > +the DHCP server will only recognize MAC and IP address pairs that have been > > +approved by Ganeti. > > + > > +The TAP network interfaces created for each guest all share the same IP > > address. > > +Therefore, it will be necessary to extend the routing table with rules > > specific > > +to each guest. This can be achieved with the following command, which > > takes the > > +guest's unique IP address and its TAP interface:: > > + > > + route add -host <ip> dev <ifname> > > + > > +For KVM, an instance will be started with a unique MAC address and the TAP > > +network interface name meant to be used by the communication mechanism. > > KVM > > +creates the actual interface:: > > + > > + kvm -net nic,macaddr=<mac> -net > > tap,ifname=<ifname>,script=no,downscript=no ... > > + > > If I understand correctly, in previous versions of Ganeti it used to be > the case that KVM opened the actual TAP interface, upon initialization > of the KVM process. This was changed however (see commit 5d9bfd870a) so > that Ganeti itself created the TAP interface, then passed it as an open > file descriptor to the KVM process. Is there any reason to deviate from > this, and make handling the TAP interface for the communication > mechanism a special case? > > Also, the same question applies as above. If setting up the DHCP server > is the responsibility of the administrator, then perhaps Ganeti should > support running ifup hooks for the TAPs. Or, Ganeti could come with its > own embedded DHCP server and handle everything by itself, without > messing with an already existing DHCP server. > > Thanks, > Vangelis. > > > +For Xen, a network interface will be created on the host (using the ``vif`` > > +parameter of the Xen configuration file). Each instance will have its > > +corresponding ``vif`` network interface on the host. The ``vif-route`` > > script > > +of Xen might be helpful in implementing this. > > + > > + > > +Metadata service > > +++++++++++++++++ > > + > > +An instance will be able to reach metadata service on > > ``169.254.169.254:80`` in > > +order to, for example, retrieve its metadata. This IP address and port > > were > > +chosen for compatibility with the OpenStack and Amazon EC2 metadata > > service. > > +The metadata service will be provided by a single daemon, which will > > determine > > +the source instance for a given request and reply with the metadata > > pertaining > > +to that instance. > > > > Where possible, the metadata will be provided in a way compatible with > > Amazon > > EC2, at:: > > > > http://169.254.169.254/<version>/meta-data/* > > > > -If some metadata are Ganeti-specific and don't fit this structure, they > > will be > > -provided at:: > > +Ganeti-specific metadata, that does not fit this structure, will be > > provided > > +at:: > > > > http://169.254.169.254/ganeti/<version>/meta_data.json > > > > -``<version>`` is either a date in YYYY-MM-DD format, or ``latest`` to > > indicate > > -the most recent available protocol version. > > +where ``<version>`` is either a date in YYYY-MM-DD format, or ``latest`` to > > +indicate the most recent available protocol version. > > > > If needed in the future, this structure also allows us to support > > OpenStack's > > metadata at:: > > > > http://169.254.169.254/openstack/<version>/meta_data.json > > > > -A bi-directional, pipe-like communication channel will be provided. The > > instance > > -will be able to receive data from the host by a GET request at:: > > +A bi-directional, pipe-like communication channel will also be provided. > > The > > +instance will be able to receive data from the host by a GET request at:: > > > > http://169.254.169.254/ganeti/<version>/read > > > > @@ -331,12 +341,10 @@ and to send data to the host by a POST request at:: > > http://169.254.169.254/ganeti/<version>/write > > > > As in a pipe, once the data are read, they will not be in the buffer > > anymore, so > > -subsequent GET requests to ``read`` will not return the same data twice. > > -Unlike a pipe, though, it will not be possible to perform blocking I/O > > -operations. > > +subsequent GET requests to ``read`` will not return the same data. > > However, > > +unlike a pipe, it will not be possible to perform blocking I/O operations. > > > > -The OS parameters will be accessible through a GET > > -request at:: > > +The OS parameters will be accessible through a GET request at:: > > > > http://169.254.169.254/ganeti/<version>/os/parameters.json > > > > @@ -424,8 +432,61 @@ the total time allowed to setup an instance inside the > > appliance. It is mainly > > meant as a safety measure to prevent an instance taken over by malicious > > scripts > > to be available for a long time. > > > > -.. vim: set textwidth=72 : > > -.. Local Variables: > > -.. mode: rst > > -.. fill-column: 72 > > -.. End: > > + > > +Port forwarding in KVM > > +++++++++++++++++++++++ > > + > > +The communication mechanism could have been implemented in KVM using guest > > port > > +forwarding, as opposed to network interfaces. There are two alternatives > > in > > +KVM's guest port forwarding, namely, creating a forwarding device, such > > as, a > > +TCP/IP connection, or executing a command. However, we have determined > > that > > +both of these options are not viable. > > + > > +A TCP/IP forwarding device can be created through the following KVM > > invocation:: > > + > > + kvm -net nic -net \ > > + user,restrict=on,net=169.254.169.0/24,host=169.254.169.253, > > + guestfwd=tcp:169.254.169.254:80-tcp:127.0.0.1:8080 ... > > + > > +This invocation even has advantage that it can remap ports, which would > > have > > +allowed the metadata service daemon to run in port 8080 instead of 80. > > However, > > +in this scheme, KVM opens the TCP connection only once, when it is > > started, and, > > +if the connection breaks, KVM will not reconnect. Furthermore, this also > > +interferes with the HTTP protocol, which needs to dynamically establish and > > +close connections. > > + > > +The alternative to opening a single TCP/IP connection is to execute a > > command. > > +The KVM invocation for this is, for example, the following:: > > + > > + kvm -net nic -net \ > > + "user,restrict=on,net=169.254.169.0/24,host=169.254.169.253, > > + guestfwd=tcp:169.254.169.254:80-netcat 127.0.0.1 8080" ... > > + > > +The advantage of this approach is that the command is executed each time > > the > > +guest initiates a connection. This is the ideal situation, however, it is > > only > > +supported in KVM 1.2 and above, and, therefore, not viable because we want > > to > > +provide support for at least KVM version 1.0, which is the version > > provided by > > +Ubuntu LTS. > > + > > + > > +Alternatives to the DHCP server > > ++++++++++++++++++++++++++++++++ > > + > > +There are alternatives to using the DHCP server, for example, by assigning > > +identical IP addresses to guests, such as, the IP address > > ``169.254.169.253``. > > +However, this introduces a routing problem, namely, how to route incoming > > +packets from the same source IP to the host. This problem can be overcome > > in a > > +number of ways. > > + > > +The first solution is to use NAT to translate the incoming guest IP > > address, for > > +example, ``169.254.169.253``, to an IP address unique within a single > > host, for > > +example, ``169.254.0.1``. Given that NAT through ``ip rule`` is > > deprecated, > > +users can resort to ``iptables``. Note that this has not yet been tested. > > + > > +Another option, which has indeed been tested in a prototype, is to connect > > the > > +TAP network interfaces of the guests to a bridge. The bridge takes the > > +configuration for the TAP network interfaces, namely, IP address > > +``169.254.169.254`` and netmask ``255.255.0.0``, thus leaving those > > interfaces > > +without an IP address. Note that in this setting, guests will be able to > > reach > > +each other, therefore, if necessary, additional ``iptables`` rules can be > > put in > > +place to prevent it. > > -- > > 1.8.5.1 > > -- > Vangelis Koukis > [email protected] > OpenPGP public key ID: > pub 1024D/1D038E97 2003-07-13 Vangelis Koukis <[email protected]> > Key fingerprint = C5CD E02E 2C78 7C10 8A00 53D8 FBFC 3799 1D03 8E97 > > Only those who will risk going too far > can possibly find out how far one can go. > -- T.S. Eliot -- Jose Antonio Lopes Ganeti Engineering Google Germany GmbH Dienerstr. 12, 80331, München Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Geschäftsführer: Graham Law, Christine Elizabeth Flores Steuernummer: 48/725/00206 Umsatzsteueridentifikationsnummer: DE813741370
