[Qemu-devel] RE: Guest bridge setup variations

2009-12-17 Thread Leonid Grossman


> -Original Message-
> From: Arnd Bergmann [mailto:a...@arndb.de]
> Sent: Wednesday, December 16, 2009 6:16 AM
> To: virtualizat...@lists.linux-foundation.org
> Cc: Leonid Grossman; qemu-devel@nongnu.org
> Subject: Re: Guest bridge setup variations
> 
> On Wednesday 16 December 2009, Leonid Grossman wrote:
> > > > 3. Doing the bridging in the NIC using macvlan in passthrough
> > > > mode. This lowers the CPU utilization further compared to 2,
> > > > at the expense of limiting throughput by the performance of
> > > > the PCIe interconnect to the adapter. Whether or not this
> > > > is a win is workload dependent.
> >
> > This is certainly true today for pci-e 1.1 and 2.0 devices, but
> > as NICs move to pci-e 3.0 (while remaining almost exclusively dual
> port
> > 10GbE for a long while),
> > EVB internal bandwidth will significantly exceed external bandwidth.
> > So, #3 can become a win for most inter-guest workloads.
> 
> Right, it's also hardware dependent, but it usually comes down
> to whether it's cheaper to spend CPU cycles or to spend IO bandwidth.
> 
> I would be surprised if all future machines with PCIe 3.0 suddenly
have
> a huge surplus of bandwidth but no CPU to keep up with that.
> 
> > > > Access controls now happen
> > > > in the NIC. Currently, this is not supported yet, due to lack of
> > > > device drivers, but it will be an important scenario in the
> future
> > > > according to some people.
> >
> > Actually, x3100 10GbE drivers support this today via sysfs interface
> to
> > the host driver
> > that can choose to control VEB tables (and therefore MAC addresses,
> vlan
> > memberships, etc. for all passthru interfaces behind the VEB).
> 
> Ok, I didn't know about that.
> 
> > OF course a more generic vendor-independent interface will be
> important
> > in the future.
> 
> Right. I hope we can come up with something soon. I'll have a look at
> what your driver does and see if that can be abstracted in some way.

Sounds good, please let us know if looking at the code/documentation
will suffice or you need a couple cards to go along with the code.

> I expect that if we can find an interface between the kernel and
device
> driver for two or three NIC implementations that it will be good
enough
> to adapt to everyone else as well.

The interface will likely evolve along with EVB standards and other
developments, but 
initial implementation can be pretty basic (and vendor-independent). 
Early IOV NIC deployments can benefit from an interface that sets couple
VF parameters missing in "legacy" NIC interface - things like bandwidth
limit and list of MAC addresses (since setting a NIC in promisc mode
doesn't work well for VEB, it is currently forced to learn the addresses
it is configured for). 
The interface can also include querying IOV NIC capabilities like number
of VFs, support for VEB and/or VEPA mode, etc as well as getting VF
stats and MAC/VLAN tables - all in all, it is not a long list.


> 
>   Arnd




[Qemu-devel] RE: Guest bridge setup variations

2009-12-17 Thread Leonid Grossman
> > -Original Message-
> > From: virtualization-boun...@lists.linux-foundation.org
> > [mailto:virtualization-boun...@lists.linux-foundation.org] On Behalf
> Of
> > Arnd Bergmann
> > Sent: Tuesday, December 08, 2009 8:08 AM
> > To: virtualizat...@lists.linux-foundation.org
> > Cc: qemu-devel@nongnu.org
> > Subject: Guest bridge setup variations
> >
> > As promised, here is my small writeup on which setups I feel
> > are important in the long run for server-type guests. This
> > does not cover -net user, which is really for desktop kinds
> > of applications where you do not want to connect into the
> > guest from another IP address.
> >
> > I can see four separate setups that we may or may not want to
> > support, the main difference being how the forwarding between
> > guests happens:
> >
> > 1. The current setup, with a bridge and tun/tap devices on ports
> > of the bridge. This is what Gerhard's work on access controls is
> > focused on and the only option where the hypervisor actually
> > is in full control of the traffic between guests. CPU utilization
> should
> > be highest this way, and network management can be a burden,
> > because the controls are done through a Linux, libvirt and/or
> Director
> > specific interface.
> >
> > 2. Using macvlan as a bridging mechanism, replacing the bridge
> > and tun/tap entirely. This should offer the best performance on
> > inter-guest communication, both in terms of throughput and
> > CPU utilization, but offer no access control for this traffic at
all.
> > Performance of guest-external traffic should be slightly better
> > than bridge/tap.
> >
> > 3. Doing the bridging in the NIC using macvlan in passthrough
> > mode. This lowers the CPU utilization further compared to 2,
> > at the expense of limiting throughput by the performance of
> > the PCIe interconnect to the adapter. Whether or not this
> > is a win is workload dependent. 

This is certainly true today for pci-e 1.1 and 2.0 devices, but 
as NICs move to pci-e 3.0 (while remaining almost exclusively dual port
10GbE for a long while), 
EVB internal bandwidth will significantly exceed external bandwidth.
So, #3 can become a win for most inter-guest workloads.

> > Access controls now happen
> > in the NIC. Currently, this is not supported yet, due to lack of
> > device drivers, but it will be an important scenario in the future
> > according to some people.

Actually, x3100 10GbE drivers support this today via sysfs interface to
the host driver 
that can choose to control VEB tables (and therefore MAC addresses, vlan
memberships, etc. for all passthru interfaces behind the VEB).
OF course a more generic vendor-independent interface will be important
in the future.

> >
> > 4. Using macvlan for actual VEPA on the outbound interface.
> > This is mostly interesting because it makes the network access
> > controls visible in an external switch that is already managed.
> > CPU utilization and guest-external throughput should be
> > identical to 3, but inter-guest latency can only be worse because
> > all frames go through the external switch.
> >
> > In case 2 through 4, we have the choice between macvtap and
> > the raw packet interface for connecting macvlan to qemu.
> > Raw sockets are better tested right now, while macvtap has
> > better permission management (i.e. it does not require
> > CAP_NET_ADMIN). Neither one is upstream though at the
> > moment. The raw driver only requires qemu patches, while
> > macvtap requires both a new kernel driver and a trivial change
> > in qemu.
> >
> > In all four cases, vhost-net could be used to move the workload
> > from user space into the kernel, which may be an advantage.
> > The decision for or against vhost-net is entirely independent of
> > the other decisions.
> >
> > Arnd
> > ___
> > Virtualization mailing list
> > virtualizat...@lists.linux-foundation.org
> > https://lists.linux-foundation.org/mailman/listinfo/virtualization




[Qemu-devel] Re: Guest bridge setup variations

2009-12-16 Thread Arnd Bergmann
On Wednesday 16 December 2009, Leonid Grossman wrote:
> > > 3. Doing the bridging in the NIC using macvlan in passthrough
> > > mode. This lowers the CPU utilization further compared to 2,
> > > at the expense of limiting throughput by the performance of
> > > the PCIe interconnect to the adapter. Whether or not this
> > > is a win is workload dependent. 
> 
> This is certainly true today for pci-e 1.1 and 2.0 devices, but 
> as NICs move to pci-e 3.0 (while remaining almost exclusively dual port
> 10GbE for a long while), 
> EVB internal bandwidth will significantly exceed external bandwidth.
> So, #3 can become a win for most inter-guest workloads.

Right, it's also hardware dependent, but it usually comes down
to whether it's cheaper to spend CPU cycles or to spend IO bandwidth.

I would be surprised if all future machines with PCIe 3.0 suddenly have
a huge surplus of bandwidth but no CPU to keep up with that.

> > > Access controls now happen
> > > in the NIC. Currently, this is not supported yet, due to lack of
> > > device drivers, but it will be an important scenario in the future
> > > according to some people.
> 
> Actually, x3100 10GbE drivers support this today via sysfs interface to
> the host driver 
> that can choose to control VEB tables (and therefore MAC addresses, vlan
> memberships, etc. for all passthru interfaces behind the VEB).

Ok, I didn't know about that.

> OF course a more generic vendor-independent interface will be important
> in the future.

Right. I hope we can come up with something soon. I'll have a look at
what your driver does and see if that can be abstracted in some way.
I expect that if we can find an interface between the kernel and device
driver for two or three NIC implementations that it will be good enough
to adapt to everyone else as well.

Arnd 




Re: [Qemu-devel] Re: Guest bridge setup variations

2009-12-10 Thread Alexander Graf

On 10.12.2009, at 21:20, Arnd Bergmann wrote:

> On Thursday 10 December 2009 19:14:28 Alexander Graf wrote:
>>> This is something I also have been thinking about, but it is not what
>>> I was referring to above. I think it would be good to keep the three
>>> cases (macvlan, VMDq, SR-IOV) as similar as possible from the user
>>> perspective, so using macvlan as an infrastructure for all of them
>>> sounds reasonable to me.
>> 
>> Oh, so you'd basically do -net vt-d,if=eth0 and the rest would
>> automatically work? That's a pretty slick idea!
> 
> I was only referring to how they get set up under the covers, e.g.
> creating the virtual device, configuring the MAC address etc, not
> the qemu side, but that would probably make sense as well.
> 
> Or even better, qemu should probably not even know the difference
> between macvlan and VT-d. In both cases, it would open a macvtap
> file, but for VT-d adapters, the macvlan infrastructure can
> use hardware support, much in the way that VLAN tagging gets
> offloaded automatically to the hardware.

Well, vt-d means we use PCI passthrough. But it probably makes sense to have a 
-net bridge,if=eth0 that automatically uses whatever is around (pci 
passthrough, macvtap, anthony's bridge script, etc.). Of course we should 
leverage vmdq for macvtap whenever available :-).

Alex



Re: [Qemu-devel] Re: Guest bridge setup variations

2009-12-10 Thread Arnd Bergmann
On Thursday 10 December 2009 19:14:28 Alexander Graf wrote:
> > This is something I also have been thinking about, but it is not what
> > I was referring to above. I think it would be good to keep the three
> > cases (macvlan, VMDq, SR-IOV) as similar as possible from the user
> > perspective, so using macvlan as an infrastructure for all of them
> > sounds reasonable to me.
> 
> Oh, so you'd basically do -net vt-d,if=eth0 and the rest would
> automatically work? That's a pretty slick idea!

I was only referring to how they get set up under the covers, e.g.
creating the virtual device, configuring the MAC address etc, not
the qemu side, but that would probably make sense as well.

Or even better, qemu should probably not even know the difference
between macvlan and VT-d. In both cases, it would open a macvtap
file, but for VT-d adapters, the macvlan infrastructure can
use hardware support, much in the way that VLAN tagging gets
offloaded automatically to the hardware.

Arnd <><




[Qemu-devel] Re: Guest bridge setup variations

2009-12-10 Thread Alexander Graf

On 10.12.2009, at 15:18, Arnd Bergmann wrote:

> On Thursday 10 December 2009, Fischer, Anna wrote:
>>> 
>>> 3. Doing the bridging in the NIC using macvlan in passthrough
>>> mode. This lowers the CPU utilization further compared to 2,
>>> at the expense of limiting throughput by the performance of
>>> the PCIe interconnect to the adapter. Whether or not this
>>> is a win is workload dependent. Access controls now happen
>>> in the NIC. Currently, this is not supported yet, due to lack of
>>> device drivers, but it will be an important scenario in the future
>>> according to some people.
>> 
>> Can you differentiate this option from typical PCI pass-through mode?
>> It is not clear to me where macvlan sits in a setup where the NIC does
>> bridging.
> 
> In this setup (hypothetical so far, the code doesn't exist yet), we use
> the configuration logic of macvlan, but not the forwarding. This also
> doesn't do PCI pass-through but instead gives all the logical interfaces
> to the host, using only the bridging and traffic separation capabilities
> of the NIC, but not the PCI-separation.
> 
> Intel calls this mode VMDq, as opposed to SR-IOV, which implies
> the assignment of the adapter to a guest.
> 
> It was confusing of me to call it passthrough above, sorry for that.
> 
>> Typically, in a PCI pass-through configuration, all configuration goes
>> through the physical function device driver (and all data goes directly
>> to the NIC). Are you suggesting to use macvlan as a common
>> configuration layer that then configures the underlying NIC?
>> I could see some benefit in such a model, though I am not certain I
>> understand you correctly.
> 
> This is something I also have been thinking about, but it is not what
> I was referring to above. I think it would be good to keep the three
> cases (macvlan, VMDq, SR-IOV) as similar as possible from the user
> perspective, so using macvlan as an infrastructure for all of them
> sounds reasonable to me.

Oh, so you'd basically do -net vt-d,if=eth0 and the rest would automatically 
work? That's a pretty slick idea!

Alex



[Qemu-devel] RE: Guest bridge setup variations

2009-12-10 Thread Fischer, Anna
> Subject: Guest bridge setup variations
> 
> As promised, here is my small writeup on which setups I feel
> are important in the long run for server-type guests. This
> does not cover -net user, which is really for desktop kinds
> of applications where you do not want to connect into the
> guest from another IP address.
> 
> I can see four separate setups that we may or may not want to
> support, the main difference being how the forwarding between
> guests happens:
> 
> 1. The current setup, with a bridge and tun/tap devices on ports
> of the bridge. This is what Gerhard's work on access controls is
> focused on and the only option where the hypervisor actually
> is in full control of the traffic between guests. CPU utilization should
> be highest this way, and network management can be a burden,
> because the controls are done through a Linux, libvirt and/or Director
> specific interface.
> 
> 2. Using macvlan as a bridging mechanism, replacing the bridge
> and tun/tap entirely. This should offer the best performance on
> inter-guest communication, both in terms of throughput and
> CPU utilization, but offer no access control for this traffic at all.
> Performance of guest-external traffic should be slightly better
> than bridge/tap.
> 
> 3. Doing the bridging in the NIC using macvlan in passthrough
> mode. This lowers the CPU utilization further compared to 2,
> at the expense of limiting throughput by the performance of
> the PCIe interconnect to the adapter. Whether or not this
> is a win is workload dependent. Access controls now happen
> in the NIC. Currently, this is not supported yet, due to lack of
> device drivers, but it will be an important scenario in the future
> according to some people.

Can you differentiate this option from typical PCI pass-through mode? It is not 
clear to me where macvlan sits in a setup where the NIC does bridging.

Typically, in a PCI pass-through configuration, all configuration goes through 
the physical function device driver (and all data goes directly to the NIC). 
Are you suggesting to use macvlan as a common configuration layer that then 
configures the underlying NIC? I could see some benefit in such a model, though 
I am not certain I understand you correctly.

Thanks,
Anna




[Qemu-devel] Re: Guest bridge setup variations

2009-12-10 Thread Arnd Bergmann
On Thursday 10 December 2009, Fischer, Anna wrote:
> > 
> > 3. Doing the bridging in the NIC using macvlan in passthrough
> > mode. This lowers the CPU utilization further compared to 2,
> > at the expense of limiting throughput by the performance of
> > the PCIe interconnect to the adapter. Whether or not this
> > is a win is workload dependent. Access controls now happen
> > in the NIC. Currently, this is not supported yet, due to lack of
> > device drivers, but it will be an important scenario in the future
> > according to some people.
> 
> Can you differentiate this option from typical PCI pass-through mode?
> It is not clear to me where macvlan sits in a setup where the NIC does
> bridging.

In this setup (hypothetical so far, the code doesn't exist yet), we use
the configuration logic of macvlan, but not the forwarding. This also
doesn't do PCI pass-through but instead gives all the logical interfaces
to the host, using only the bridging and traffic separation capabilities
of the NIC, but not the PCI-separation.

Intel calls this mode VMDq, as opposed to SR-IOV, which implies
the assignment of the adapter to a guest.

It was confusing of me to call it passthrough above, sorry for that.

> Typically, in a PCI pass-through configuration, all configuration goes
> through the physical function device driver (and all data goes directly
> to the NIC). Are you suggesting to use macvlan as a common
> configuration layer that then configures the underlying NIC?
> I could see some benefit in such a model, though I am not certain I
> understand you correctly.

This is something I also have been thinking about, but it is not what
I was referring to above. I think it would be good to keep the three
cases (macvlan, VMDq, SR-IOV) as similar as possible from the user
perspective, so using macvlan as an infrastructure for all of them
sounds reasonable to me.

The difference between VMDq and SR-IOV in that case would be
that the former uses a virtio-net driver in the guest and a hardware
driver in the host, while the latter uses a hardware driver in the guest
only. The data flow on these two would be identical though, while
in the classic macvlan the data forwarding decisions are made in
the host kernel.

Arnd