I'm not sure why but it looks like my intro email for this got eaten by
something. Here it is again and sorry if it shows up twice. This is my
first time posting to the list and submitting a patch and I guess
something doesn't like the way I did it.
-----
Hi all,
Here is something I've been tinkering with the past few weeks and now
have it
in a state where the basic idea makes sense, it works, and could use some
feedback from the community.
This is what I've been calling QDES or QEMU Distributed Ethernet Switch. I
first had the idea when I was playing with the udp and mcast socket network
backends while exploring how to build a VM infrastructure. I liked the
idea of
using the sockets backends cause it doesn't require escalated permissions to
configure and run as well as the ability to talk over IP networks.
But the built in socket backends either allowed for only 2 guests talking
directly or for multiple guests where all traffic is sent to all. So one can
either have two guests talking or have bandwidth wasted with multiple
guests.
There wasn't something that could talk to multiple guests but also utilize
unicast traffic.
So I made a backend that can do this. It takes the basics of how the udp and
mcast socket backends work and combines them with some switching based
on the
ethernet packets. The result is multiple guests can talk to each other but
not waste bandwidth by delivering unicast traffic to all guests. The backend
also adds some header data to each packet. This header includes a network
identifier so multiple logical networks can be created using the same
multicast configuration but still have separation in the guests.
There are a couple advantages that I see to this. It allows for multiple
guests
in multiple locations to talk to each other while keeping unicast traffic to
just between two hosts. It doesn't require root permissions to run. It can
operate over non-ethernet networks (like IPoIB). It doesn't require changing
network configuration on the host. It allows for a ton of logical
networks to
be created (currently 65536 per multicast address and port combination).
There are a few disadvantages as well. It does add some more processing
to the
QEMU process but not much (I saw it go as fast as the socket backends).
It is
encapsulating an Ethernet frame inside a UDP packet so there is the
overhead of
the IP and UDP headers as well as the transport medium headers (most likely
Ethernet again). Because there is additional header data and MTU of the
guest
could be limited depending on the ability to send larger multicast
packet from
the host. (I haven't really looked closely at this last one). There
isn't the
ability for something besides QEMU processes to communicate using this,
though
I hope to build a utility to work with a tap device.
Overall, I think this is something that's pretty cool. I don't know how much
people give any thought to the socket backends for real world use and so I
don't know if this would be of much use to anyone. I am looking for some
feedback into what the community thinks and for comments about the code. Its
only my second time doing more than 20 lines of C so I'm sure I did some
stupid
things. I have only tested on 64 bit x86 Linux systems so far.
Hopefully you all have good things to say. :)
mike