On Wed, Jan 15, 2014 at 01:13:07PM +0200, Vangelis Koukis wrote: > On Tue, Jan 14, 2014 at 05:25:20pm +0100, Jose A. Lopes wrote: > > > > > In a similar note, who will be responsible for setting up the DHCP > > > > > server? It could be the administrator's responsibility, but then if it > > > > > is Ganeti the entity which picks the MAC addresses and IPs for the > > > > > guest > > > > > side of the TAP interfaces, how will this DHCP server be notified, so > > > > > as > > > > > to only server the correct IP addresses to specific MAC addresses? > > > > > > > > Ganeti configures the DHCP server, starts it and stops it. Ganeti > > > > also reconfigures the DHCP server when a new VM is started/stopped. > > > > The DHCP server listens only on the TAP interfaces for the VMs so it > > > > shouldn't interfere with other DHCP servers running on the host. I > > > > will make it more clear in the design doc. > > > > > > > > Currently, I have only experimented with 'dnsmasq'. This DHCP server > > > > allows all of the above. The only thing that could be improved is the > > > > fact that it is not possible to dynamically extend the interfaces > > > > 'dnsmasq' is listening to. Therefore, it is necessary to update the > > > > configuration file and restart 'dnsmasq'. > > > > > > > > > > Have you been able to give specific (tap, MAC, IP) tuples to dnsmasq, > > > somehow binding a MAC address on a specific TAP interface? > > > In other words, how do you instruct dnsmasq to only honor a DHCP request > > > from a specific MAC address, if it only comes from a specific TAP? > > > > > > I'm looking at the dnsmasq manpage for the "--dhcp-host" argument: > > > > > > -G, > > > --dhcp-host=[<hwaddr>][,id:<client_id>|*][,set:<tag>][,<ipaddr>][,<hostname>][,<lease_time>][,ignore] > > > > > > and can't seem to find an obvious way to do it. > > > > The way I am doing right is by placing the following in the dnsmasq > > configuration file: > > > > interface=vm1,vm2 > > > > dhcp-host=52:54:00:12:34:56,169.254.0.1 > > dhcp-host=52:54:00:65:43:21,169.254.0.2 > > > > where 'interface' is the set of interfaces to listen on and > > 'dhcp-host' specifies the bindings. Naturally, this can be changed to > > listen only on one interface, etc. See the files in attachment for > > more examples. > > > > Hello Jose, > > it's not clear to me how this enforces a clear association between a > single TAP interface and a single MAC address. I understand that the > DHCP server listens on both interfaces, vm1 and vm2, but where is the > association made that the VM on interface "vm1" may only use MAC address > 52:54:00:12:34:56 (first one), and not 52:54:00:65:43:21 (the second one)?
The combination of the dnsmasq conf plus the routing rules does enforce a single association. I have sent a new version of the document and added a paragraph explaining this in detail. Please go through it and if you have any questions let me know. > Also, how do you plan to update the interfaces on which dnsmasq listens, > and the contents of its TAP<->MAC<->IP database, dynamically? Rewrite > the configuration files, then restart the server? I think this will be a > source of problems, especially in more cloud-like environments, where > VMs are expected to go up and down at high rates. Just to clarify, there is an instance of the DHCP server running per node. Just out of curiosity, what kind of rates are you dealing with? How many machines per second/minute/hour go up and down on average? Thanks, Jose > Thank you, > Vangelis. > > > > Can you share more information on your experimental setup? > > > Is every TAP interface independent, do you have them all on a bridge? > > > > My experimental setup is: > > > > 1. sudo ./dns1.sh > > > > This starts the first KVM instance and configures its TAP interface, > > routes, etc, and also starts the DHCP server with configuration file > > dnsmasq1.conf > > > > 2. sudo ./dns2.sh > > > > This starts the second KVM instance and configures the stuff as > > well, and restarts the DHCP server. > > > > I have included all files from my experimental setup in attachment if > > you want to have a look. > > > > > I'll come back to the issue of updating dnsmasq configurations and > > > handling > > > multiple TAP interfaces concurrently in a reply to your other mails about > > > nfdhcpd. > > > > > > Thanks, > > > Vangelis. > > > > > > > > Also, if it is the administrator's responsibility, then perhaps the > > > > > admin should be able to set up standard ifup hooks, like for every > > > > > other interface of an instance. But in the following examples, you > > > > > specifically set script=no,downscript=no. > > > > > > > > > > Another possibility would be for Ganeti to come prepackaged with its > > > > > own, embedded DHCP server just for serving requests on the TAPs used > > > > > for > > > > > the communication mechanism. We've been using snf-nfdhcpd > > > > > (https://code.grnet.gr/projects/snf-nfdhcpd) for just that in > > > > > production. > > > > > > > > > > Actually, in previous conversation Guido had asked us to document how > > > > > to > > > > > set it up with Ganeti, and merge the resulting docs with the Ganeti > > > > > upstream. Perhaps it would make sense to combine the effort now, and > > > > > use > > > > > snf-nfdhcpd as an embedded DHCP server with Ganeti. Sorry for not > > > > > having > > > > > documented it earlier. > > > > > > > > I'm going to have a look at this and ask Guido about it. > > > > > > > > > > +DHCP protocol on its last network interface to contact a DHCP > > > > > > server running on > > > > > > +the host and thus determine its IP address. The DHCP server will > > > > > > be listening > > > > > > +exclusively on the TAP network interfaces of the guests. > > > > > > Therefore, it will not > > > > > > +interfere with a potential DHCP server running on the same host. > > > > > > Furthermore, > > > > > > +the DHCP server will only recognize MAC and IP address pairs that > > > > > > have been > > > > > > +approved by Ganeti. > > > > > > + > > > > > > +The TAP network interfaces created for each guest all share the > > > > > > same IP address. > > > > > > +Therefore, it will be necessary to extend the routing table with > > > > > > rules specific > > > > > > +to each guest. This can be achieved with the following command, > > > > > > which takes the > > > > > > +guest's unique IP address and its TAP interface:: > > > > > > + > > > > > > + route add -host <ip> dev <ifname> > > > > > > + > > > > > > +For KVM, an instance will be started with a unique MAC address and > > > > > > the TAP > > > > > > +network interface name meant to be used by the communication > > > > > > mechanism. KVM > > > > > > +creates the actual interface:: > > > > > > + > > > > > > + kvm -net nic,macaddr=<mac> -net > > > > > > tap,ifname=<ifname>,script=no,downscript=no ... > > > > > > + > > > > > > > > > > If I understand correctly, in previous versions of Ganeti it used to > > > > > be > > > > > the case that KVM opened the actual TAP interface, upon initialization > > > > > of the KVM process. This was changed however (see commit 5d9bfd870a) > > > > > so > > > > > that Ganeti itself created the TAP interface, then passed it as an > > > > > open > > > > > file descriptor to the KVM process. Is there any reason to deviate > > > > > from > > > > > this, and make handling the TAP interface for the communication > > > > > mechanism a special case? > > > > > > > > > > Also, the same question applies as above. If setting up the DHCP > > > > > server > > > > > is the responsibility of the administrator, then perhaps Ganeti should > > > > > support running ifup hooks for the TAPs. Or, Ganeti could come with > > > > > its > > > > > own embedded DHCP server and handle everything by itself, without > > > > > messing with an already existing DHCP server. > > > > > > > > > > Thanks, > > > > > Vangelis. > > > > > > > > > > > +For Xen, a network interface will be created on the host (using > > > > > > the ``vif`` > > > > > > +parameter of the Xen configuration file). Each instance will have > > > > > > its > > > > > > +corresponding ``vif`` network interface on the host. The > > > > > > ``vif-route`` script > > > > > > +of Xen might be helpful in implementing this. > > > > > > + > > > > > > + > > > > > > +Metadata service > > > > > > +++++++++++++++++ > > > > > > + > > > > > > +An instance will be able to reach metadata service on > > > > > > ``169.254.169.254:80`` in > > > > > > +order to, for example, retrieve its metadata. This IP address and > > > > > > port were > > > > > > +chosen for compatibility with the OpenStack and Amazon EC2 > > > > > > metadata service. > > > > > > +The metadata service will be provided by a single daemon, which > > > > > > will determine > > > > > > +the source instance for a given request and reply with the > > > > > > metadata pertaining > > > > > > +to that instance. > > > > > > > > > > > > Where possible, the metadata will be provided in a way compatible > > > > > > with Amazon > > > > > > EC2, at:: > > > > > > > > > > > > http://169.254.169.254/<version>/meta-data/* > > > > > > > > > > > > -If some metadata are Ganeti-specific and don't fit this structure, > > > > > > they will be > > > > > > -provided at:: > > > > > > +Ganeti-specific metadata, that does not fit this structure, will > > > > > > be provided > > > > > > +at:: > > > > > > > > > > > > http://169.254.169.254/ganeti/<version>/meta_data.json > > > > > > > > > > > > -``<version>`` is either a date in YYYY-MM-DD format, or ``latest`` > > > > > > to indicate > > > > > > -the most recent available protocol version. > > > > > > +where ``<version>`` is either a date in YYYY-MM-DD format, or > > > > > > ``latest`` to > > > > > > +indicate the most recent available protocol version. > > > > > > > > > > > > If needed in the future, this structure also allows us to support > > > > > > OpenStack's > > > > > > metadata at:: > > > > > > > > > > > > http://169.254.169.254/openstack/<version>/meta_data.json > > > > > > > > > > > > -A bi-directional, pipe-like communication channel will be > > > > > > provided. The instance > > > > > > -will be able to receive data from the host by a GET request at:: > > > > > > +A bi-directional, pipe-like communication channel will also be > > > > > > provided. The > > > > > > +instance will be able to receive data from the host by a GET > > > > > > request at:: > > > > > > > > > > > > http://169.254.169.254/ganeti/<version>/read > > > > > > > > > > > > @@ -331,12 +341,10 @@ and to send data to the host by a POST > > > > > > request at:: > > > > > > http://169.254.169.254/ganeti/<version>/write > > > > > > > > > > > > As in a pipe, once the data are read, they will not be in the > > > > > > buffer anymore, so > > > > > > -subsequent GET requests to ``read`` will not return the same data > > > > > > twice. > > > > > > -Unlike a pipe, though, it will not be possible to perform blocking > > > > > > I/O > > > > > > -operations. > > > > > > +subsequent GET requests to ``read`` will not return the same data. > > > > > > However, > > > > > > +unlike a pipe, it will not be possible to perform blocking I/O > > > > > > operations. > > > > > > > > > > > > -The OS parameters will be accessible through a GET > > > > > > -request at:: > > > > > > +The OS parameters will be accessible through a GET request at:: > > > > > > > > > > > > http://169.254.169.254/ganeti/<version>/os/parameters.json > > > > > > > > > > > > @@ -424,8 +432,61 @@ the total time allowed to setup an instance > > > > > > inside the appliance. It is mainly > > > > > > meant as a safety measure to prevent an instance taken over by > > > > > > malicious scripts > > > > > > to be available for a long time. > > > > > > > > > > > > -.. vim: set textwidth=72 : > > > > > > -.. Local Variables: > > > > > > -.. mode: rst > > > > > > -.. fill-column: 72 > > > > > > -.. End: > > > > > > + > > > > > > +Port forwarding in KVM > > > > > > +++++++++++++++++++++++ > > > > > > + > > > > > > +The communication mechanism could have been implemented in KVM > > > > > > using guest port > > > > > > +forwarding, as opposed to network interfaces. There are two > > > > > > alternatives in > > > > > > +KVM's guest port forwarding, namely, creating a forwarding device, > > > > > > such as, a > > > > > > +TCP/IP connection, or executing a command. However, we have > > > > > > determined that > > > > > > +both of these options are not viable. > > > > > > + > > > > > > +A TCP/IP forwarding device can be created through the following > > > > > > KVM invocation:: > > > > > > + > > > > > > + kvm -net nic -net \ > > > > > > + user,restrict=on,net=169.254.169.0/24,host=169.254.169.253, > > > > > > + guestfwd=tcp:169.254.169.254:80-tcp:127.0.0.1:8080 ... > > > > > > + > > > > > > +This invocation even has advantage that it can remap ports, which > > > > > > would have > > > > > > +allowed the metadata service daemon to run in port 8080 instead of > > > > > > 80. However, > > > > > > +in this scheme, KVM opens the TCP connection only once, when it is > > > > > > started, and, > > > > > > +if the connection breaks, KVM will not reconnect. Furthermore, > > > > > > this also > > > > > > +interferes with the HTTP protocol, which needs to dynamically > > > > > > establish and > > > > > > +close connections. > > > > > > + > > > > > > +The alternative to opening a single TCP/IP connection is to > > > > > > execute a command. > > > > > > +The KVM invocation for this is, for example, the following:: > > > > > > + > > > > > > + kvm -net nic -net \ > > > > > > + "user,restrict=on,net=169.254.169.0/24,host=169.254.169.253, > > > > > > + guestfwd=tcp:169.254.169.254:80-netcat 127.0.0.1 8080" ... > > > > > > + > > > > > > +The advantage of this approach is that the command is executed > > > > > > each time the > > > > > > +guest initiates a connection. This is the ideal situation, > > > > > > however, it is only > > > > > > +supported in KVM 1.2 and above, and, therefore, not viable because > > > > > > we want to > > > > > > +provide support for at least KVM version 1.0, which is the version > > > > > > provided by > > > > > > +Ubuntu LTS. > > > > > > + > > > > > > + > > > > > > +Alternatives to the DHCP server > > > > > > ++++++++++++++++++++++++++++++++ > > > > > > + > > > > > > +There are alternatives to using the DHCP server, for example, by > > > > > > assigning > > > > > > +identical IP addresses to guests, such as, the IP address > > > > > > ``169.254.169.253``. > > > > > > +However, this introduces a routing problem, namely, how to route > > > > > > incoming > > > > > > +packets from the same source IP to the host. This problem can be > > > > > > overcome in a > > > > > > +number of ways. > > > > > > + > > > > > > +The first solution is to use NAT to translate the incoming guest > > > > > > IP address, for > > > > > > +example, ``169.254.169.253``, to an IP address unique within a > > > > > > single host, for > > > > > > +example, ``169.254.0.1``. Given that NAT through ``ip rule`` is > > > > > > deprecated, > > > > > > +users can resort to ``iptables``. Note that this has not yet been > > > > > > tested. > > > > > > + > > > > > > +Another option, which has indeed been tested in a prototype, is to > > > > > > connect the > > > > > > +TAP network interfaces of the guests to a bridge. The bridge > > > > > > takes the > > > > > > +configuration for the TAP network interfaces, namely, IP address > > > > > > +``169.254.169.254`` and netmask ``255.255.0.0``, thus leaving > > > > > > those interfaces > > > > > > +without an IP address. Note that in this setting, guests will be > > > > > > able to reach > > > > > > +each other, therefore, if necessary, additional ``iptables`` rules > > > > > > can be put in > > > > > > +place to prevent it. > > > > > > -- > > > > > > 1.8.5.1 > > > > > > > > > > > -- > > > Vangelis Koukis > > > [email protected] > > > OpenPGP public key ID: > > > pub 1024D/1D038E97 2003-07-13 Vangelis Koukis <[email protected]> > > > Key fingerprint = C5CD E02E 2C78 7C10 8A00 53D8 FBFC 3799 1D03 8E97 > > > > > > Only those who will risk going too far > > > can possibly find out how far one can go. > > > -- T.S. Eliot > > > > > > > > -- > > Jose Antonio Lopes > > Ganeti Engineering > > Google Germany GmbH > > Dienerstr. 12, 80331, München > > > > Registergericht und -nummer: Hamburg, HRB 86891 > > Sitz der Gesellschaft: Hamburg > > Geschäftsführer: Graham Law, Christine Elizabeth Flores > > Steuernummer: 48/725/00206 > > Umsatzsteueridentifikationsnummer: DE813741370 > > > > > # Configuration file for dnsmasq. > > # > > # Format is one option per line, legal options are the same > > # as the long options legal on the command line. See > > # "/usr/sbin/dnsmasq --help" or "man 8 dnsmasq" for details. > > > > bind-interfaces > > dhcp-authoritative > > leasefile-ro > > no-hosts > > no-resolv > > no-ping > > strict-order > > > > dhcp-range=169.254.0.0,169.254.169.253,255.255.0.0 > > except-interface=lo > > pid-file=/var/run/ganeti/dnsmasq.pid > > port=0 > > > > interface=vm1 > > dhcp-host=52:54:00:12:34:56,169.254.0.1 > > > # Configuration file for dnsmasq. > > # > > # Format is one option per line, legal options are the same > > # as the long options legal on the command line. See > > # "/usr/sbin/dnsmasq --help" or "man 8 dnsmasq" for details. > > > > bind-interfaces > > dhcp-authoritative > > leasefile-ro > > no-hosts > > no-resolv > > no-ping > > strict-order > > > > dhcp-range=169.254.0.0,169.254.169.253,255.255.0.0 > > except-interface=lo > > pid-file=/var/run/ganeti/dnsmasq.pid > > port=0 > > > > interface=vm1,vm2 > > > > dhcp-host=52:54:00:12:34:56,169.254.0.1 > > dhcp-host=52:54:00:65:43:21,169.254.0.2 > -- Jose Antonio Lopes Ganeti Engineering Google Germany GmbH Dienerstr. 12, 80331, München Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Geschäftsführer: Graham Law, Christine Elizabeth Flores Steuernummer: 48/725/00206 Umsatzsteueridentifikationsnummer: DE813741370
