On Tue, Jan 14, 2014 at 05:21:27PM +0200, Vangelis Koukis wrote:
> On Mon, Jan 13, 2014 at 10:19:11am +0100, Jose A. Lopes wrote:
> > [snip]
> 
> Hello Jose,
> 
> Thanks for your detailed answer!
> Comments follow inline.
> 
> > > > +On the host side, these TAP network interfaces will have IP address
> > > > +``169.254.169.254`` in the network ``169.254.0.0/16`` (i.e., netmask
> > > > +``255.255.0.0``).  On the guest side, each instance will have its own 
> > > > MAC
> 
> This was my initial concern: If I setup an interface (say tap0) to have
> IP 169.254.169.254 with netmask 255.255.0.0, doesn't this imply that
> there is going to be a new entry in the routing table to route network
> 169.254.0.0/16 via tap0? How can this work if I assign the *same* IP,
> with the same non-/32 netmask to multiple interfaces?

Answered below.

> > > > +address and an IP address in the network ``169.254.0.0/16``.  The MAC 
> > > > address
> > > > +and the IP address must be unique within a single host. The guest will 
> > > > use the
> > > 
> > > It's not very clear to me, who will be responsible for setting up
> > > the host side of the TAP interfaces. Who will be responsible for
> > > assigning the IP address 169.254.169.254 on all TAP intefaces of the
> > > host, and what will the routing rules be? To clarify, say I have 3 VMs,
> > > on tap0, tap1 and tap2, with IPs 169.254.0.1, 169.254.0.2, 169.254.0.3
> > > respectively.
> > 
> > Ganeti configures the interfaces.  It seems a good idea for Ganeti to
> > create the TAP interfaces and pass them as filedescriptor to KVM, as
> > you suggested.  Ganeti will then configure the IP address on the
> > interface.  I will update the design doc with this.
> > 
> > > If the host has IP 169.254.169.254 on all interfaces, with the same /16
> > > netmask, how will it be able to pick the right interface when sending an
> > > IP packet to VM1 vs. when sending to VM3?
> > > 
> > > I think this could work with explicit routes: One to 169.254.0.1/32
> > > through tap0, one to 169.254.0.2/32 through tap1, and one through
> > > 169.254.0.2/32 through tap2. If yes, will Ganeti set up these routes
> > > explicitly?
> > 
> > Isn't this already solved with the 'route add -host ...'?
> > 
> 
> Yes, I think this would be OK. For example, in the case of VM1 being on
> tap0 and having IP 169.254.0.1, this would add a new routing entry with
> a /32 netmask, e.g.:
> 
> # ip ro ls
> ...
> 169.254.0.1 dev tap0  proto static  scope link
> 
> So, also in comparison to the discussion about netmask above, what sense would
> it make to have IP 169.254.169.254 with an IP of /16 on this interface? Would
> it be better to say something along the lines of "The host will have IP
> 169.254.169.254 with a netmask of /32 on all interfaces, and explicit routes,
> for each VM being behind each interface"? (e.g., 169.254.0.1/32 dev tap0,
> 169.254.0.2/32 dev tap1, and so on).
> 
> # ip addr add 169.254.169.254 dev tap0
> # ip addr list
> ...
>     inet 169.254.169.254/32 scope global tap0
> 
> Otherwise, trying to specify a /16 netmask leads to the creation of an
> (unwanted?) routing entry for the whole /16 network, and multiple
> routing entries, which don't really contribute anything:
> 
> 169.254.0.0/16 dev tap0  proto kernel  scope link  src 169.254.169.254
> 169.254.0.0/16 dev tap1  proto kernel  scope link  src 169.254.169.254

You're absolutely right! I will update the design document to use /32
instead of /16 for the netmask.

> > > In a similar note, who will be responsible for setting up the DHCP
> > > server? It could be the administrator's responsibility, but then if it
> > > is Ganeti the entity which picks the MAC addresses and IPs for the guest
> > > side of the TAP interfaces, how will this DHCP server be notified, so as
> > > to only server the correct IP addresses to specific MAC addresses?
> > 
> > Ganeti configures the DHCP server, starts it and stops it.  Ganeti
> > also reconfigures the DHCP server when a new VM is started/stopped.
> > The DHCP server listens only on the TAP interfaces for the VMs so it
> > shouldn't interfere with other DHCP servers running on the host.  I
> > will make it more clear in the design doc.
> > 
> > Currently, I have only experimented with 'dnsmasq'.  This DHCP server
> > allows all of the above.  The only thing that could be improved is the
> > fact that it is not possible to dynamically extend the interfaces
> > 'dnsmasq' is listening to.  Therefore, it is necessary to update the
> > configuration file and restart 'dnsmasq'.
> > 
> 
> Have you been able to give specific (tap, MAC, IP) tuples to dnsmasq,
> somehow binding a MAC address on a specific TAP interface?
> In other words, how do you instruct dnsmasq to only honor a DHCP request
> from a specific MAC address, if it only comes from a specific TAP?
> 
> I'm looking at the dnsmasq manpage for the "--dhcp-host" argument:
> 
> -G, 
> --dhcp-host=[<hwaddr>][,id:<client_id>|*][,set:<tag>][,<ipaddr>][,<hostname>][,<lease_time>][,ignore]
> 
> and can't seem to find an obvious way to do it.

The way I am doing right is by placing the following in the dnsmasq
configuration file:

  interface=vm1,vm2

  dhcp-host=52:54:00:12:34:56,169.254.0.1
  dhcp-host=52:54:00:65:43:21,169.254.0.2

where 'interface' is the set of interfaces to listen on and
'dhcp-host' specifies the bindings.  Naturally, this can be changed to
listen only on one interface, etc.  See the files in attachment for
more examples.

> Can you share more information on your experimental setup?
> Is every TAP interface independent, do you have them all on a bridge?

My experimental setup is:

1. sudo ./dns1.sh

  This starts the first KVM instance and configures its TAP interface,
  routes, etc, and also starts the DHCP server with configuration file
  dnsmasq1.conf

2. sudo ./dns2.sh

  This starts the second KVM instance and configures the stuff as
  well, and restarts the DHCP server.

I have included all files from my experimental setup in attachment if
you want to have a look.

> I'll come back to the issue of updating dnsmasq configurations and handling
> multiple TAP interfaces concurrently in a reply to your other mails about
> nfdhcpd.
> 
> Thanks,
> Vangelis.
> 
> > > Also, if it is the administrator's responsibility, then perhaps the
> > > admin should be able to set up standard ifup hooks, like for every
> > > other interface of an instance. But in the following examples, you
> > > specifically set script=no,downscript=no.
> > > 
> > > Another possibility would be for Ganeti to come prepackaged with its
> > > own, embedded DHCP server just for serving requests on the TAPs used for
> > > the communication mechanism. We've been using snf-nfdhcpd
> > > (https://code.grnet.gr/projects/snf-nfdhcpd) for just that in
> > > production.
> > >
> > > Actually, in previous conversation Guido had asked us to document how to
> > > set it up with Ganeti, and merge the resulting docs with the Ganeti
> > > upstream. Perhaps it would make sense to combine the effort now, and use
> > > snf-nfdhcpd as an embedded DHCP server with Ganeti. Sorry for not having
> > > documented it earlier.
> > 
> > I'm going to have a look at this and ask Guido about it.
> > 
> > > > +DHCP protocol on its last network interface to contact a DHCP server 
> > > > running on
> > > > +the host and thus determine its IP address.  The DHCP server will be 
> > > > listening
> > > > +exclusively on the TAP network interfaces of the guests.  Therefore, 
> > > > it will not
> > > > +interfere with a potential DHCP server running on the same host.  
> > > > Furthermore,
> > > > +the DHCP server will only recognize MAC and IP address pairs that have 
> > > > been
> > > > +approved by Ganeti.
> > > > +
> > > > +The TAP network interfaces created for each guest all share the same 
> > > > IP address.
> > > > +Therefore, it will be necessary to extend the routing table with rules 
> > > > specific
> > > > +to each guest.  This can be achieved with the following command, which 
> > > > takes the
> > > > +guest's unique IP address and its TAP interface::
> > > > +
> > > > +  route add -host <ip> dev <ifname>
> > > > +
> > > > +For KVM, an instance will be started with a unique MAC address and the 
> > > > TAP
> > > > +network interface name meant to be used by the communication 
> > > > mechanism.  KVM
> > > > +creates the actual interface::
> > > > +
> > > > +  kvm -net nic,macaddr=<mac> -net 
> > > > tap,ifname=<ifname>,script=no,downscript=no ...
> > > > +
> > > 
> > > If I understand correctly, in previous versions of Ganeti it used to be
> > > the case that KVM opened the actual TAP interface, upon initialization
> > > of the KVM process. This was changed however (see commit 5d9bfd870a) so
> > > that Ganeti itself created the TAP interface, then passed it as an open
> > > file descriptor to the KVM process. Is there any reason to deviate from
> > > this, and make handling the TAP interface for the communication
> > > mechanism a special case?
> > > 
> > > Also, the same question applies as above. If setting up the DHCP server 
> > > is the responsibility of the administrator, then perhaps Ganeti should
> > > support running ifup hooks for the TAPs. Or, Ganeti could come with its
> > > own embedded DHCP server and handle everything by itself, without
> > > messing with an already existing DHCP server.
> > > 
> > > Thanks,
> > > Vangelis.
> > > 
> > > > +For Xen, a network interface will be created on the host (using the 
> > > > ``vif``
> > > > +parameter of the Xen configuration file).  Each instance will have its
> > > > +corresponding ``vif`` network interface on the host.  The 
> > > > ``vif-route`` script
> > > > +of Xen might be helpful in implementing this.
> > > > +
> > > > +
> > > > +Metadata service
> > > > +++++++++++++++++
> > > > +
> > > > +An instance will be able to reach metadata service on 
> > > > ``169.254.169.254:80`` in
> > > > +order to, for example, retrieve its metadata.  This IP address and 
> > > > port were
> > > > +chosen for compatibility with the OpenStack and Amazon EC2 metadata 
> > > > service.
> > > > +The metadata service will be provided by a single daemon, which will 
> > > > determine
> > > > +the source instance for a given request and reply with the metadata 
> > > > pertaining
> > > > +to that instance.
> > > >  
> > > >  Where possible, the metadata will be provided in a way compatible with 
> > > > Amazon
> > > >  EC2, at::
> > > >  
> > > >    http://169.254.169.254/<version>/meta-data/*
> > > >  
> > > > -If some metadata are Ganeti-specific and don't fit this structure, 
> > > > they will be
> > > > -provided at::
> > > > +Ganeti-specific metadata, that does not fit this structure, will be 
> > > > provided
> > > > +at::
> > > >  
> > > >    http://169.254.169.254/ganeti/<version>/meta_data.json
> > > >  
> > > > -``<version>`` is either a date in YYYY-MM-DD format, or ``latest`` to 
> > > > indicate
> > > > -the most recent available protocol version.
> > > > +where ``<version>`` is either a date in YYYY-MM-DD format, or 
> > > > ``latest`` to
> > > > +indicate the most recent available protocol version.
> > > >  
> > > >  If needed in the future, this structure also allows us to support 
> > > > OpenStack's
> > > >  metadata at::
> > > >  
> > > >    http://169.254.169.254/openstack/<version>/meta_data.json
> > > >  
> > > > -A bi-directional, pipe-like communication channel will be provided. 
> > > > The instance
> > > > -will be able to receive data from the host by a GET request at::
> > > > +A bi-directional, pipe-like communication channel will also be 
> > > > provided.  The
> > > > +instance will be able to receive data from the host by a GET request 
> > > > at::
> > > >  
> > > >    http://169.254.169.254/ganeti/<version>/read
> > > >  
> > > > @@ -331,12 +341,10 @@ and to send data to the host by a POST request 
> > > > at::
> > > >    http://169.254.169.254/ganeti/<version>/write
> > > >  
> > > >  As in a pipe, once the data are read, they will not be in the buffer 
> > > > anymore, so
> > > > -subsequent GET requests to ``read`` will not return the same data 
> > > > twice.
> > > > -Unlike a pipe, though, it will not be possible to perform blocking I/O
> > > > -operations.
> > > > +subsequent GET requests to ``read`` will not return the same data.  
> > > > However,
> > > > +unlike a pipe, it will not be possible to perform blocking I/O 
> > > > operations.
> > > >  
> > > > -The OS parameters will be accessible through a GET
> > > > -request at::
> > > > +The OS parameters will be accessible through a GET request at::
> > > >  
> > > >    http://169.254.169.254/ganeti/<version>/os/parameters.json
> > > >  
> > > > @@ -424,8 +432,61 @@ the total time allowed to setup an instance inside 
> > > > the appliance. It is mainly
> > > >  meant as a safety measure to prevent an instance taken over by 
> > > > malicious scripts
> > > >  to be available for a long time.
> > > >  
> > > > -.. vim: set textwidth=72 :
> > > > -.. Local Variables:
> > > > -.. mode: rst
> > > > -.. fill-column: 72
> > > > -.. End:
> > > > +
> > > > +Port forwarding in KVM
> > > > +++++++++++++++++++++++
> > > > +
> > > > +The communication mechanism could have been implemented in KVM using 
> > > > guest port
> > > > +forwarding, as opposed to network interfaces.  There are two 
> > > > alternatives in
> > > > +KVM's guest port forwarding, namely, creating a forwarding device, 
> > > > such as, a
> > > > +TCP/IP connection, or executing a command.  However, we have 
> > > > determined that
> > > > +both of these options are not viable.
> > > > +
> > > > +A TCP/IP forwarding device can be created through the following KVM 
> > > > invocation::
> > > > +
> > > > +  kvm -net nic -net \
> > > > +    user,restrict=on,net=169.254.169.0/24,host=169.254.169.253,
> > > > +    guestfwd=tcp:169.254.169.254:80-tcp:127.0.0.1:8080 ...
> > > > +
> > > > +This invocation even has advantage that it can remap ports, which 
> > > > would have
> > > > +allowed the metadata service daemon to run in port 8080 instead of 80. 
> > > >  However,
> > > > +in this scheme, KVM opens the TCP connection only once, when it is 
> > > > started, and,
> > > > +if the connection breaks, KVM will not reconnect.  Furthermore, this 
> > > > also
> > > > +interferes with the HTTP protocol, which needs to dynamically 
> > > > establish and
> > > > +close connections.
> > > > +
> > > > +The alternative to opening a single TCP/IP connection is to execute a 
> > > > command.
> > > > +The KVM invocation for this is, for example, the following::
> > > > +
> > > > +  kvm -net nic -net \
> > > > +    "user,restrict=on,net=169.254.169.0/24,host=169.254.169.253,
> > > > +    guestfwd=tcp:169.254.169.254:80-netcat 127.0.0.1 8080" ...
> > > > +
> > > > +The advantage of this approach is that the command is executed each 
> > > > time the
> > > > +guest initiates a connection.  This is the ideal situation, however, 
> > > > it is only
> > > > +supported in KVM 1.2 and above, and, therefore, not viable because we 
> > > > want to
> > > > +provide support for at least KVM version 1.0, which is the version 
> > > > provided by
> > > > +Ubuntu LTS.
> > > > +
> > > > +
> > > > +Alternatives to the DHCP server
> > > > ++++++++++++++++++++++++++++++++
> > > > +
> > > > +There are alternatives to using the DHCP server, for example, by 
> > > > assigning
> > > > +identical IP addresses to guests, such as, the IP address 
> > > > ``169.254.169.253``.
> > > > +However, this introduces a routing problem, namely, how to route 
> > > > incoming
> > > > +packets from the same source IP to the host.  This problem can be 
> > > > overcome in a
> > > > +number of ways.
> > > > +
> > > > +The first solution is to use NAT to translate the incoming guest IP 
> > > > address, for
> > > > +example, ``169.254.169.253``, to an IP address unique within a single 
> > > > host, for
> > > > +example, ``169.254.0.1``.  Given that NAT through ``ip rule`` is 
> > > > deprecated,
> > > > +users can resort to ``iptables``.  Note that this has not yet been 
> > > > tested.
> > > > +
> > > > +Another option, which has indeed been tested in a prototype, is to 
> > > > connect the
> > > > +TAP network interfaces of the guests to a bridge.  The bridge takes the
> > > > +configuration for the TAP network interfaces, namely, IP address
> > > > +``169.254.169.254`` and netmask ``255.255.0.0``, thus leaving those 
> > > > interfaces
> > > > +without an IP address.  Note that in this setting, guests will be able 
> > > > to reach
> > > > +each other, therefore, if necessary, additional ``iptables`` rules can 
> > > > be put in
> > > > +place to prevent it.
> > > > -- 
> > > > 1.8.5.1
> > > 
> 
> -- 
> Vangelis Koukis
> [email protected]
> OpenPGP public key ID:
> pub  1024D/1D038E97 2003-07-13 Vangelis Koukis <[email protected]>
>      Key fingerprint = C5CD E02E 2C78 7C10 8A00  53D8 FBFC 3799 1D03 8E97
> 
> Only those who will risk going too far
> can possibly find out how far one can go.
>         -- T.S. Eliot



-- 
Jose Antonio Lopes
Ganeti Engineering
Google Germany GmbH
Dienerstr. 12, 80331, München

Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Graham Law, Christine Elizabeth Flores
Steuernummer: 48/725/00206
Umsatzsteueridentifikationsnummer: DE813741370
# Configuration file for dnsmasq.
#
# Format is one option per line, legal options are the same
# as the long options legal on the command line. See
# "/usr/sbin/dnsmasq --help" or "man 8 dnsmasq" for details.

bind-interfaces
dhcp-authoritative
leasefile-ro
no-hosts
no-resolv
no-ping
strict-order

dhcp-range=169.254.0.0,169.254.169.253,255.255.0.0
except-interface=lo
pid-file=/var/run/ganeti/dnsmasq.pid
port=0

interface=vm1
dhcp-host=52:54:00:12:34:56,169.254.0.1

Attachment: dns1.sh
Description: Bourne shell script

Attachment: dns2.sh
Description: Bourne shell script

# Configuration file for dnsmasq.
#
# Format is one option per line, legal options are the same
# as the long options legal on the command line. See
# "/usr/sbin/dnsmasq --help" or "man 8 dnsmasq" for details.

bind-interfaces
dhcp-authoritative
leasefile-ro
no-hosts
no-resolv
no-ping
strict-order

dhcp-range=169.254.0.0,169.254.169.253,255.255.0.0
except-interface=lo
pid-file=/var/run/ganeti/dnsmasq.pid
port=0

interface=vm1,vm2

dhcp-host=52:54:00:12:34:56,169.254.0.1
dhcp-host=52:54:00:65:43:21,169.254.0.2

Reply via email to