On Wed, Jan 15, 2014 at 12:51:31PM +0100, Thomas Thrainer wrote: > On Wed, Jan 15, 2014 at 12:13 PM, Vangelis Koukis <[email protected]> wrote: > > > On Tue, Jan 14, 2014 at 05:25:20pm +0100, Jose A. Lopes wrote: > > > > > > In a similar note, who will be responsible for setting up the DHCP > > > > > > server? It could be the administrator's responsibility, but then > > if it > > > > > > is Ganeti the entity which picks the MAC addresses and IPs for the > > guest > > > > > > side of the TAP interfaces, how will this DHCP server be notified, > > so as > > > > > > to only server the correct IP addresses to specific MAC addresses? > > > > > > > > > > Ganeti configures the DHCP server, starts it and stops it. Ganeti > > > > > also reconfigures the DHCP server when a new VM is started/stopped. > > > > > The DHCP server listens only on the TAP interfaces for the VMs so it > > > > > shouldn't interfere with other DHCP servers running on the host. I > > > > > will make it more clear in the design doc. > > > > > > > > > > Currently, I have only experimented with 'dnsmasq'. This DHCP server > > > > > allows all of the above. The only thing that could be improved is > > the > > > > > fact that it is not possible to dynamically extend the interfaces > > > > > 'dnsmasq' is listening to. Therefore, it is necessary to update the > > > > > configuration file and restart 'dnsmasq'. > > > > > > > > > > > > > Have you been able to give specific (tap, MAC, IP) tuples to dnsmasq, > > > > somehow binding a MAC address on a specific TAP interface? > > > > In other words, how do you instruct dnsmasq to only honor a DHCP > > request > > > > from a specific MAC address, if it only comes from a specific TAP? > > > > > > > > I'm looking at the dnsmasq manpage for the "--dhcp-host" argument: > > > > > > > > -G, > > --dhcp-host=[<hwaddr>][,id:<client_id>|*][,set:<tag>][,<ipaddr>][,<hostname>][,<lease_time>][,ignore] > > > > > > > > and can't seem to find an obvious way to do it. > > > > > > The way I am doing right is by placing the following in the dnsmasq > > > configuration file: > > > > > > interface=vm1,vm2 > > > > > > dhcp-host=52:54:00:12:34:56,169.254.0.1 > > > dhcp-host=52:54:00:65:43:21,169.254.0.2 > > > > > > where 'interface' is the set of interfaces to listen on and > > > 'dhcp-host' specifies the bindings. Naturally, this can be changed to > > > listen only on one interface, etc. See the files in attachment for > > > more examples. > > > > > > > Hello Jose, > > > > it's not clear to me how this enforces a clear association between a > > single TAP interface and a single MAC address. I understand that the > > DHCP server listens on both interfaces, vm1 and vm2, but where is the > > association made that the VM on interface "vm1" may only use MAC address > > 52:54:00:12:34:56 (first one), and not 52:54:00:65:43:21 (the second one)? > > > > Also, how do you plan to update the interfaces on which dnsmasq listens, > > and the contents of its TAP<->MAC<->IP database, dynamically? Rewrite > > the configuration files, then restart the server? I think this will be a > > source of problems, especially in more cloud-like environments, where > > VMs are expected to go up and down at high rates. > > > > dnsmasq reloads its configuration files on SIGHUP, AFAIK. > > Just my 2c...
This would be interesting for the leases but unfortunately it does not work for the interfaces, which are fundamental. >From the manpage: " When it receives a SIGHUP, dnsmasq clears its cache and then re-loads /etc/hosts and /etc/ethers and any file given by --dhcp-hostsfile, --dhcp-optsfile or --addn-hosts. The dhcp lease change script is called for all existing DHCP leases. If --no-poll is set SIGHUP also re-reads /etc/resolv.conf. SIGHUP does NOT re-read the configuration file. " Thanks for the input, Jose > > > > Thank you, > > Vangelis. > > > > > > Can you share more information on your experimental setup? > > > > Is every TAP interface independent, do you have them all on a bridge? > > > > > > My experimental setup is: > > > > > > 1. sudo ./dns1.sh > > > > > > This starts the first KVM instance and configures its TAP interface, > > > routes, etc, and also starts the DHCP server with configuration file > > > dnsmasq1.conf > > > > > > 2. sudo ./dns2.sh > > > > > > This starts the second KVM instance and configures the stuff as > > > well, and restarts the DHCP server. > > > > > > I have included all files from my experimental setup in attachment if > > > you want to have a look. > > > > > > > I'll come back to the issue of updating dnsmasq configurations and > > handling > > > > multiple TAP interfaces concurrently in a reply to your other mails > > about > > > > nfdhcpd. > > > > > > > > Thanks, > > > > Vangelis. > > > > > > > > > > Also, if it is the administrator's responsibility, then perhaps the > > > > > > admin should be able to set up standard ifup hooks, like for every > > > > > > other interface of an instance. But in the following examples, you > > > > > > specifically set script=no,downscript=no. > > > > > > > > > > > > Another possibility would be for Ganeti to come prepackaged with > > its > > > > > > own, embedded DHCP server just for serving requests on the TAPs > > used for > > > > > > the communication mechanism. We've been using snf-nfdhcpd > > > > > > (https://code.grnet.gr/projects/snf-nfdhcpd) for just that in > > > > > > production. > > > > > > > > > > > > Actually, in previous conversation Guido had asked us to document > > how to > > > > > > set it up with Ganeti, and merge the resulting docs with the Ganeti > > > > > > upstream. Perhaps it would make sense to combine the effort now, > > and use > > > > > > snf-nfdhcpd as an embedded DHCP server with Ganeti. Sorry for not > > having > > > > > > documented it earlier. > > > > > > > > > > I'm going to have a look at this and ask Guido about it. > > > > > > > > > > > > +DHCP protocol on its last network interface to contact a DHCP > > server running on > > > > > > > +the host and thus determine its IP address. The DHCP server > > will be listening > > > > > > > +exclusively on the TAP network interfaces of the guests. > > Therefore, it will not > > > > > > > +interfere with a potential DHCP server running on the same > > host. Furthermore, > > > > > > > +the DHCP server will only recognize MAC and IP address pairs > > that have been > > > > > > > +approved by Ganeti. > > > > > > > + > > > > > > > +The TAP network interfaces created for each guest all share the > > same IP address. > > > > > > > +Therefore, it will be necessary to extend the routing table > > with rules specific > > > > > > > +to each guest. This can be achieved with the following > > command, which takes the > > > > > > > +guest's unique IP address and its TAP interface:: > > > > > > > + > > > > > > > + route add -host <ip> dev <ifname> > > > > > > > + > > > > > > > +For KVM, an instance will be started with a unique MAC address > > and the TAP > > > > > > > +network interface name meant to be used by the communication > > mechanism. KVM > > > > > > > +creates the actual interface:: > > > > > > > + > > > > > > > + kvm -net nic,macaddr=<mac> -net > > tap,ifname=<ifname>,script=no,downscript=no ... > > > > > > > + > > > > > > > > > > > > If I understand correctly, in previous versions of Ganeti it used > > to be > > > > > > the case that KVM opened the actual TAP interface, upon > > initialization > > > > > > of the KVM process. This was changed however (see commit > > 5d9bfd870a) so > > > > > > that Ganeti itself created the TAP interface, then passed it as an > > open > > > > > > file descriptor to the KVM process. Is there any reason to deviate > > from > > > > > > this, and make handling the TAP interface for the communication > > > > > > mechanism a special case? > > > > > > > > > > > > Also, the same question applies as above. If setting up the DHCP > > server > > > > > > is the responsibility of the administrator, then perhaps Ganeti > > should > > > > > > support running ifup hooks for the TAPs. Or, Ganeti could come > > with its > > > > > > own embedded DHCP server and handle everything by itself, without > > > > > > messing with an already existing DHCP server. > > > > > > > > > > > > Thanks, > > > > > > Vangelis. > > > > > > > > > > > > > +For Xen, a network interface will be created on the host (using > > the ``vif`` > > > > > > > +parameter of the Xen configuration file). Each instance will > > have its > > > > > > > +corresponding ``vif`` network interface on the host. The > > ``vif-route`` script > > > > > > > +of Xen might be helpful in implementing this. > > > > > > > + > > > > > > > + > > > > > > > +Metadata service > > > > > > > +++++++++++++++++ > > > > > > > + > > > > > > > +An instance will be able to reach metadata service on > > ``169.254.169.254:80`` in > > > > > > > +order to, for example, retrieve its metadata. This IP address > > and port were > > > > > > > +chosen for compatibility with the OpenStack and Amazon EC2 > > metadata service. > > > > > > > +The metadata service will be provided by a single daemon, which > > will determine > > > > > > > +the source instance for a given request and reply with the > > metadata pertaining > > > > > > > +to that instance. > > > > > > > > > > > > > > Where possible, the metadata will be provided in a way > > compatible with Amazon > > > > > > > EC2, at:: > > > > > > > > > > > > > > http://169.254.169.254/<version>/meta-data/* > > > > > > > > > > > > > > -If some metadata are Ganeti-specific and don't fit this > > structure, they will be > > > > > > > -provided at:: > > > > > > > +Ganeti-specific metadata, that does not fit this structure, > > will be provided > > > > > > > +at:: > > > > > > > > > > > > > > http://169.254.169.254/ganeti/<version>/meta_data.json > > > > > > > > > > > > > > -``<version>`` is either a date in YYYY-MM-DD format, or > > ``latest`` to indicate > > > > > > > -the most recent available protocol version. > > > > > > > +where ``<version>`` is either a date in YYYY-MM-DD format, or > > ``latest`` to > > > > > > > +indicate the most recent available protocol version. > > > > > > > > > > > > > > If needed in the future, this structure also allows us to > > support OpenStack's > > > > > > > metadata at:: > > > > > > > > > > > > > > http://169.254.169.254/openstack/<version>/meta_data.json > > > > > > > > > > > > > > -A bi-directional, pipe-like communication channel will be > > provided. The instance > > > > > > > -will be able to receive data from the host by a GET request at:: > > > > > > > +A bi-directional, pipe-like communication channel will also be > > provided. The > > > > > > > +instance will be able to receive data from the host by a GET > > request at:: > > > > > > > > > > > > > > http://169.254.169.254/ganeti/<version>/read > > > > > > > > > > > > > > @@ -331,12 +341,10 @@ and to send data to the host by a POST > > request at:: > > > > > > > http://169.254.169.254/ganeti/<version>/write > > > > > > > > > > > > > > As in a pipe, once the data are read, they will not be in the > > buffer anymore, so > > > > > > > -subsequent GET requests to ``read`` will not return the same > > data twice. > > > > > > > -Unlike a pipe, though, it will not be possible to perform > > blocking I/O > > > > > > > -operations. > > > > > > > +subsequent GET requests to ``read`` will not return the same > > data. However, > > > > > > > +unlike a pipe, it will not be possible to perform blocking I/O > > operations. > > > > > > > > > > > > > > -The OS parameters will be accessible through a GET > > > > > > > -request at:: > > > > > > > +The OS parameters will be accessible through a GET request at:: > > > > > > > > > > > > > > http://169.254.169.254/ganeti/<version>/os/parameters.json > > > > > > > > > > > > > > @@ -424,8 +432,61 @@ the total time allowed to setup an instance > > inside the appliance. It is mainly > > > > > > > meant as a safety measure to prevent an instance taken over by > > malicious scripts > > > > > > > to be available for a long time. > > > > > > > > > > > > > > -.. vim: set textwidth=72 : > > > > > > > -.. Local Variables: > > > > > > > -.. mode: rst > > > > > > > -.. fill-column: 72 > > > > > > > -.. End: > > > > > > > + > > > > > > > +Port forwarding in KVM > > > > > > > +++++++++++++++++++++++ > > > > > > > + > > > > > > > +The communication mechanism could have been implemented in KVM > > using guest port > > > > > > > +forwarding, as opposed to network interfaces. There are two > > alternatives in > > > > > > > +KVM's guest port forwarding, namely, creating a forwarding > > device, such as, a > > > > > > > +TCP/IP connection, or executing a command. However, we have > > determined that > > > > > > > +both of these options are not viable. > > > > > > > + > > > > > > > +A TCP/IP forwarding device can be created through the following > > KVM invocation:: > > > > > > > + > > > > > > > + kvm -net nic -net \ > > > > > > > + user,restrict=on,net=169.254.169.0/24,host=169.254.169.253, > > > > > > > + guestfwd=tcp:169.254.169.254:80-tcp:127.0.0.1:8080 ... > > > > > > > + > > > > > > > +This invocation even has advantage that it can remap ports, > > which would have > > > > > > > +allowed the metadata service daemon to run in port 8080 instead > > of 80. However, > > > > > > > +in this scheme, KVM opens the TCP connection only once, when it > > is started, and, > > > > > > > +if the connection breaks, KVM will not reconnect. Furthermore, > > this also > > > > > > > +interferes with the HTTP protocol, which needs to dynamically > > establish and > > > > > > > +close connections. > > > > > > > + > > > > > > > +The alternative to opening a single TCP/IP connection is to > > execute a command. > > > > > > > +The KVM invocation for this is, for example, the following:: > > > > > > > + > > > > > > > + kvm -net nic -net \ > > > > > > > + "user,restrict=on,net=169.254.169.0/24,host=169.254.169.253 > > , > > > > > > > + guestfwd=tcp:169.254.169.254:80-netcat 127.0.0.1 8080" ... > > > > > > > + > > > > > > > +The advantage of this approach is that the command is executed > > each time the > > > > > > > +guest initiates a connection. This is the ideal situation, > > however, it is only > > > > > > > +supported in KVM 1.2 and above, and, therefore, not viable > > because we want to > > > > > > > +provide support for at least KVM version 1.0, which is the > > version provided by > > > > > > > +Ubuntu LTS. > > > > > > > + > > > > > > > + > > > > > > > +Alternatives to the DHCP server > > > > > > > ++++++++++++++++++++++++++++++++ > > > > > > > + > > > > > > > +There are alternatives to using the DHCP server, for example, > > by assigning > > > > > > > +identical IP addresses to guests, such as, the IP address > > ``169.254.169.253``. > > > > > > > +However, this introduces a routing problem, namely, how to > > route incoming > > > > > > > +packets from the same source IP to the host. This problem can > > be overcome in a > > > > > > > +number of ways. > > > > > > > + > > > > > > > +The first solution is to use NAT to translate the incoming > > guest IP address, for > > > > > > > +example, ``169.254.169.253``, to an IP address unique within a > > single host, for > > > > > > > +example, ``169.254.0.1``. Given that NAT through ``ip rule`` > > is deprecated, > > > > > > > +users can resort to ``iptables``. Note that this has not yet > > been tested. > > > > > > > + > > > > > > > +Another option, which has indeed been tested in a prototype, is > > to connect the > > > > > > > +TAP network interfaces of the guests to a bridge. The bridge > > takes the > > > > > > > +configuration for the TAP network interfaces, namely, IP address > > > > > > > +``169.254.169.254`` and netmask ``255.255.0.0``, thus leaving > > those interfaces > > > > > > > +without an IP address. Note that in this setting, guests will > > be able to reach > > > > > > > +each other, therefore, if necessary, additional ``iptables`` > > rules can be put in > > > > > > > +place to prevent it. > > > > > > > -- > > > > > > > 1.8.5.1 > > > > > > > > > > > > > > -- > > > > Vangelis Koukis > > > > [email protected] > > > > OpenPGP public key ID: > > > > pub 1024D/1D038E97 2003-07-13 Vangelis Koukis < > > [email protected]> > > > > Key fingerprint = C5CD E02E 2C78 7C10 8A00 53D8 FBFC 3799 1D03 > > 8E97 > > > > > > > > Only those who will risk going too far > > > > can possibly find out how far one can go. > > > > -- T.S. Eliot > > > > > > > > > > > > -- > > > Jose Antonio Lopes > > > Ganeti Engineering > > > Google Germany GmbH > > > Dienerstr. 12, 80331, München > > > > > > Registergericht und -nummer: Hamburg, HRB 86891 > > > Sitz der Gesellschaft: Hamburg > > > Geschäftsführer: Graham Law, Christine Elizabeth Flores > > > Steuernummer: 48/725/00206 > > > Umsatzsteueridentifikationsnummer: DE813741370 > > > > > > > > > # Configuration file for dnsmasq. > > > # > > > # Format is one option per line, legal options are the same > > > # as the long options legal on the command line. See > > > # "/usr/sbin/dnsmasq --help" or "man 8 dnsmasq" for details. > > > > > > bind-interfaces > > > dhcp-authoritative > > > leasefile-ro > > > no-hosts > > > no-resolv > > > no-ping > > > strict-order > > > > > > dhcp-range=169.254.0.0,169.254.169.253,255.255.0.0 > > > except-interface=lo > > > pid-file=/var/run/ganeti/dnsmasq.pid > > > port=0 > > > > > > interface=vm1 > > > dhcp-host=52:54:00:12:34:56,169.254.0.1 > > > > > # Configuration file for dnsmasq. > > > # > > > # Format is one option per line, legal options are the same > > > # as the long options legal on the command line. See > > > # "/usr/sbin/dnsmasq --help" or "man 8 dnsmasq" for details. > > > > > > bind-interfaces > > > dhcp-authoritative > > > leasefile-ro > > > no-hosts > > > no-resolv > > > no-ping > > > strict-order > > > > > > dhcp-range=169.254.0.0,169.254.169.253,255.255.0.0 > > > except-interface=lo > > > pid-file=/var/run/ganeti/dnsmasq.pid > > > port=0 > > > > > > interface=vm1,vm2 > > > > > > dhcp-host=52:54:00:12:34:56,169.254.0.1 > > > dhcp-host=52:54:00:65:43:21,169.254.0.2 > > > > > > > -- > Thomas Thrainer | Software Engineer | [email protected] | > > Google Germany GmbH > Dienerstr. 12 > 80331 München > > Registergericht und -nummer: Hamburg, HRB 86891 > Sitz der Gesellschaft: Hamburg > Geschäftsführer: Graham Law, Christine Elizabeth Flores -- Jose Antonio Lopes Ganeti Engineering Google Germany GmbH Dienerstr. 12, 80331, München Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Geschäftsführer: Graham Law, Christine Elizabeth Flores Steuernummer: 48/725/00206 Umsatzsteueridentifikationsnummer: DE813741370
