Re: copyless virtio net thoughts?

2009-02-19 Thread Simon Horman
On Thu, Feb 19, 2009 at 10:06:17PM +1030, Rusty Russell wrote:
> On Thursday 19 February 2009 10:01:42 Simon Horman wrote:
> > On Wed, Feb 18, 2009 at 10:08:00PM +1030, Rusty Russell wrote:
> > > 
> > > 2) Direct NIC attachment This is particularly interesting with SR-IOV or
> > > other multiqueue nics, but for boutique cases or benchmarks, could be for
> > > normal NICs.  So far I have some very sketched-out patches: for the
> > > attached nic dev_alloc_skb() gets an skb from the guest (which supplies
> > > them via some kind of AIO interface), and a branch in netif_receive_skb()
> > > which returned it to the guest.  This bypasses all firewalling in the
> > > host though; we're basically having the guest process drive the NIC
> > > directly.
> > 
> > Hi Rusty,
> > 
> > Can I clarify that the idea with utilising SR-IOV would be to assign
> > virtual functions to guests? That is, something conceptually similar to
> > PCI pass-through in Xen (although I'm not sure that anyone has virtual
> > function pass-through working yet).
> 
> Not quite: I think PCI passthrough IMHO is the *wrong* way to do it: it
> makes migrate complicated (if not impossible), and requires emulation or
> the same NIC on the destination host.
> 
> This would be the *host* seeing the virtual functions as multiple NICs,
> then the ability to attach a given NIC directly to a process.
> 
> This isn't guest-visible: the kvm process is configured to connect
> directly to a NIC, rather than (say) bridging through the host.

Hi Rusty, Hi Chris,

Thanks for the clarification.

I think that the approach that Xen recommends for migration is to
use a bonding device that accesses the pass-through device if present
and a virtual nic.

The idea that you outline above does sound somewhat cleaner :-)

> > If so, wouldn't this also be useful on machines that have multiple
> > NICs?
> 
> Yes, but mainly as a benchmark hack AFAICT :)

Ok, I was under the impression that at least in the Xen world it
was something people actually used. But I could easily be mistaken.

> Hope that clarifies, Rusty.

On Thu, Feb 19, 2009 at 03:37:52AM -0800, Chris Wright wrote:
> * Simon Horman (ho...@verge.net.au) wrote:
> > On Wed, Feb 18, 2009 at 10:08:00PM +1030, Rusty Russell wrote:
> > > 2) Direct NIC attachment This is particularly interesting with SR-IOV or
> > > other multiqueue nics, but for boutique cases or benchmarks, could be for
> > > normal NICs.  So far I have some very sketched-out patches: for the
> > > attached nic dev_alloc_skb() gets an skb from the guest (which supplies
> > > them via some kind of AIO interface), and a branch in netif_receive_skb()
> > > which returned it to the guest.  This bypasses all firewalling in the
> > > host though; we're basically having the guest process drive the NIC
> > > directly.
> > 
> > Can I clarify that the idea with utilising SR-IOV would be to assign
> > virtual functions to guests? That is, something conceptually similar to
> > PCI pass-through in Xen (although I'm not sure that anyone has virtual
> > function pass-through working yet). If so, wouldn't this also be useful
> > on machines that have multiple NICs?
> 
> This would be the typical usecase for sr-iov.  But I think Rusty is
> referring to giving a nic "directly" to a guest but the guest is still
> seeing a virtio nic (not pass-through/device-assignment).  So there's
> no bridge, and zero copy so the dma buffers are supplied by guest,
> but host has the driver for the physical nic or the VF.

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: copyless virtio net thoughts?

2009-02-19 Thread Arnd Bergmann
On Thursday 19 February 2009, Rusty Russell wrote:

> Not quite: I think PCI passthrough IMHO is the *wrong* way to do it:
> it makes migrate complicated (if not impossible), and requires
> emulation or the same NIC on the destination host.  
> 
> This would be the *host* seeing the virtual functions as multiple
> NICs, then the ability to attach a given NIC directly to a process.

I guess what you mean then is what Intel calls VMDq, not SR-IOV.
Eddie has some slides about this at
http://docs.huihoo.com/kvm/kvmforum2008/kdf2008_7.pdf .

The latest network cards support both operation modes, and it
appears to me that there is a place for both. VMDq gives you
the best performance without limiting flexibility, while SR-IOV
performance in theory can be even better, but sacrificing a
lot of flexibility and potentially local (guest-to-gest)
performance.

AFAICT, any card that supports SR-IOV should also allow a VMDq
like model, as you describe.

Arnd <><
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: copyless virtio net thoughts?

2009-02-19 Thread Chris Wright
* Simon Horman (ho...@verge.net.au) wrote:
> On Wed, Feb 18, 2009 at 10:08:00PM +1030, Rusty Russell wrote:
> > 2) Direct NIC attachment This is particularly interesting with SR-IOV or
> > other multiqueue nics, but for boutique cases or benchmarks, could be for
> > normal NICs.  So far I have some very sketched-out patches: for the
> > attached nic dev_alloc_skb() gets an skb from the guest (which supplies
> > them via some kind of AIO interface), and a branch in netif_receive_skb()
> > which returned it to the guest.  This bypasses all firewalling in the
> > host though; we're basically having the guest process drive the NIC
> > directly.
> 
> Can I clarify that the idea with utilising SR-IOV would be to assign
> virtual functions to guests? That is, something conceptually similar to
> PCI pass-through in Xen (although I'm not sure that anyone has virtual
> function pass-through working yet). If so, wouldn't this also be useful
> on machines that have multiple NICs?

This would be the typical usecase for sr-iov.  But I think Rusty is
referring to giving a nic "directly" to a guest but the guest is still
seeing a virtio nic (not pass-through/device-assignment).  So there's
no bridge, and zero copy so the dma buffers are supplied by guest,
but host has the driver for the physical nic or the VF.

thanks,
-chris
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: copyless virtio net thoughts?

2009-02-19 Thread Rusty Russell
On Thursday 19 February 2009 10:01:42 Simon Horman wrote:
> On Wed, Feb 18, 2009 at 10:08:00PM +1030, Rusty Russell wrote:
> > 
> > 2) Direct NIC attachment This is particularly interesting with SR-IOV or
> > other multiqueue nics, but for boutique cases or benchmarks, could be for
> > normal NICs.  So far I have some very sketched-out patches: for the
> > attached nic dev_alloc_skb() gets an skb from the guest (which supplies
> > them via some kind of AIO interface), and a branch in netif_receive_skb()
> > which returned it to the guest.  This bypasses all firewalling in the
> > host though; we're basically having the guest process drive the NIC
> > directly.
> 
> Hi Rusty,
> 
> Can I clarify that the idea with utilising SR-IOV would be to assign
> virtual functions to guests? That is, something conceptually similar to
> PCI pass-through in Xen (although I'm not sure that anyone has virtual
> function pass-through working yet).

Not quite: I think PCI passthrough IMHO is the *wrong* way to do it: it makes 
migrate complicated (if not impossible), and requires emulation or the same NIC 
on the destination host.

This would be the *host* seeing the virtual functions as multiple NICs, then
the ability to attach a given NIC directly to a process.

This isn't guest-visible: the kvm process is configured to connect directly to 
a NIC, rather than (say) bridging through the host.

> If so, wouldn't this also be useful
> on machines that have multiple NICs?

Yes, but mainly as a benchmark hack AFAICT :)

Hope that clarifies,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: copyless virtio net thoughts?

2009-02-19 Thread Rusty Russell
On Thursday 19 February 2009 02:54:06 Arnd Bergmann wrote:
> On Wednesday 18 February 2009, Rusty Russell wrote:
> 
> > 2) Direct NIC attachment
> > This is particularly interesting with SR-IOV or other multiqueue nics,
> > but for boutique cases or benchmarks, could be for normal NICs.  So
> > far I have some very sketched-out patches: for the attached nic 
> > dev_alloc_skb() gets an skb from the guest (which supplies them via
> > some kind of AIO interface), and a branch in netif_receive_skb()
> > which returned it to the guest.  This bypasses all firewalling in
> > the host though; we're basically having the guest process drive
> > the NIC directly.   
> 
> If this is not passing the PCI device directly to the guest, but
> uses your concept, wouldn't it still be possible to use the firewalling
> in the host? You can always inspect the headers, drop the frame, etc
> without copying the whole frame at any point.

It's possible, but you don't want routing or parsing, etc: the NIC
is just "directly" attached to the guest.

You could do it in qemu or whatever, but it would not be the kernel scheme
(netfilter/iptables).

> > 3) Direct interguest networking
> > Anthony has been thinking here: vmsplice has already been mentioned.
> > The idea of passing directly from one guest to another is an
> > interesting one: using dma engines might be possible too.  Again,
> > host can't firewall this traffic.  Simplest as a dedicated "internal
> > lan" NIC, but we could theoretically do a fast-path for certain MAC
> > addresses on a general guest NIC. 
> 
> Another option would be to use an SR-IOV adapter from multiple guests,
> with a virtual ethernet bridge in the adapter. This moves the overhead
> from the CPU to the bus and/or adapter, so it may or may not be a real
> benefit depending on the workload.

Yes, I guess this should work.  Even different SR-IOV adapters will simply
send to one another.  I'm not sure this obviates the desire to have direct
inter-guest which is more generic though.

Thanks!
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: copyless virtio net thoughts?

2009-02-18 Thread Dong, Eddie
Simon Horman wrote:
> On Wed, Feb 18, 2009 at 10:08:00PM +1030, Rusty Russell
> wrote: 
>> 
>> 2) Direct NIC attachment This is particularly
>> interesting with SR-IOV or other multiqueue nics, but
>> for boutique cases or benchmarks, could be for normal
>> NICs.  So far I have some very sketched-out patches: for
>> the attached nic dev_alloc_skb() gets an skb from the
>> guest (which supplies them via some kind of AIO
>> interface), and a branch in netif_receive_skb() which
>> returned it to the guest.  This bypasses all firewalling
>> in the host though; we're basically having the guest
>> process drive the NIC directly.  
> 
> Hi Rusty,
> 
> Can I clarify that the idea with utilising SR-IOV would
> be to assign virtual functions to guests? That is,
> something conceptually similar to PCI pass-through in Xen
> (although I'm not sure that anyone has virtual function
> pass-through working yet). If so, wouldn't this also be
> useful on machines that have multiple NICs? 
> 
Yes, and we have successfully get it run with assigning VF to guest in both Xen 
& KVM, but we are still working on pushing those patches out since it needs 
Linux PCI subsystem support & driver support.

Thx, eddie--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: copyless virtio net thoughts?

2009-02-18 Thread Simon Horman
On Wed, Feb 18, 2009 at 10:08:00PM +1030, Rusty Russell wrote:
> 
> 2) Direct NIC attachment This is particularly interesting with SR-IOV or
> other multiqueue nics, but for boutique cases or benchmarks, could be for
> normal NICs.  So far I have some very sketched-out patches: for the
> attached nic dev_alloc_skb() gets an skb from the guest (which supplies
> them via some kind of AIO interface), and a branch in netif_receive_skb()
> which returned it to the guest.  This bypasses all firewalling in the
> host though; we're basically having the guest process drive the NIC
> directly.

Hi Rusty,

Can I clarify that the idea with utilising SR-IOV would be to assign
virtual functions to guests? That is, something conceptually similar to
PCI pass-through in Xen (although I'm not sure that anyone has virtual
function pass-through working yet). If so, wouldn't this also be useful
on machines that have multiple NICs?

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: copyless virtio net thoughts?

2009-02-18 Thread Arnd Bergmann
On Wednesday 18 February 2009, Rusty Russell wrote:

> 2) Direct NIC attachment
> This is particularly interesting with SR-IOV or other multiqueue nics,
> but for boutique cases or benchmarks, could be for normal NICs.  So
> far I have some very sketched-out patches: for the attached nic 
> dev_alloc_skb() gets an skb from the guest (which supplies them via
> some kind of AIO interface), and a branch in netif_receive_skb()
> which returned it to the guest.  This bypasses all firewalling in
> the host though; we're basically having the guest process drive
> the NIC directly.   

If this is not passing the PCI device directly to the guest, but
uses your concept, wouldn't it still be possible to use the firewalling
in the host? You can always inspect the headers, drop the frame, etc
without copying the whole frame at any point.

When it gets to the point of actually giving the (real pf or sr-iov vf)
to one guest, you really get to the point where you can't do local
firewalling any more.

> 3) Direct interguest networking
> Anthony has been thinking here: vmsplice has already been mentioned.
> The idea of passing directly from one guest to another is an
> interesting one: using dma engines might be possible too.  Again,
> host can't firewall this traffic.  Simplest as a dedicated "internal
> lan" NIC, but we could theoretically do a fast-path for certain MAC
> addresses on a general guest NIC. 

Another option would be to use an SR-IOV adapter from multiple guests,
with a virtual ethernet bridge in the adapter. This moves the overhead
from the CPU to the bus and/or adapter, so it may or may not be a real
benefit depending on the workload.

Arnd <><
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: copyless virtio net thoughts?

2009-02-18 Thread Herbert Xu
On Wed, Feb 18, 2009 at 10:08:00PM +1030, Rusty Russell wrote:
>
> 4) Multiple queues
> This is Herbert's.  Should be fairly simple to add; it was in the back of my 
> mind when we started.  Not sure whether the queues should be static or 
> dynamic (imagine direct interguest networking, one queue pair for each other 
> guest), and how xmit queues would be selected by the guest (anything 
> anywhere, or dst mac?).

The primary purpose of multiple queues is to maximise CPU utilisation,
so the number of queues is simply dependent on the number of CPUs
allotted to the guest.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: copyless virtio net thoughts?

2009-02-18 Thread Rusty Russell
On Thursday 05 February 2009 12:37:32 Chris Wright wrote:
> There's been a number of different discussions re: getting copyless virtio
> net (esp. for KVM).  This is just a poke in that general direction to
> stir the discussion.  I'm interested to hear current thoughts?

This thread seems to have died out, time for me to weigh in!

There are four promising areas that I see when looking at virtio_net 
performance.  I list them all here because they may interact:

1) Async tap access.
2) Direct NIC attachment.
3) Direct interguest networking.
4) Multiqueue virtio_net.

1) Async tap access
Either via aio, or something like the prototype virtio_ring patches I produced 
last year.  This is potentially copyless networking for xmit (bar header), with 
one copy on recv.

2) Direct NIC attachment
This is particularly interesting with SR-IOV or other multiqueue nics, but for 
boutique cases or benchmarks, could be for normal NICs.  So far I have some 
very sketched-out patches: for the attached nic dev_alloc_skb() gets an skb 
from the guest (which supplies them via some kind of AIO interface), and a 
branch in netif_receive_skb() which returned it to the guest.  This bypasses 
all firewalling in the host though; we're basically having the guest process 
drive the NIC directly.

3) Direct interguest networking
Anthony has been thinking here: vmsplice has already been mentioned.  The idea 
of passing directly from one guest to another is an interesting one: using dma 
engines might be possible too.  Again, host can't firewall this traffic.  
Simplest as a dedicated "internal lan" NIC, but we could theoretically do a 
fast-path for certain MAC addresses on a general guest NIC.

4) Multiple queues
This is Herbert's.  Should be fairly simple to add; it was in the back of my 
mind when we started.  Not sure whether the queues should be static or dynamic 
(imagine direct interguest networking, one queue pair for each other guest), 
and how xmit queues would be selected by the guest (anything anywhere, or dst 
mac?).

Anyone else want to make comments?

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: copyless virtio net thoughts?

2009-02-07 Thread David Miller
From: Arnd Bergmann 
Date: Sat, 7 Feb 2009 12:56:06 +0100

> Having the load spread evenly over all guests sounds like a much rarer
> use case.

Totally agreed.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: copyless virtio net thoughts?

2009-02-07 Thread Arnd Bergmann
On Friday 06 February 2009, Avi Kivity wrote:
> > Well, these guests will suck both on baremetal and in virtualisation,
> > big deal :) Multiqueue at 10GbE speeds and above is simply not an
> > optional feature.
> >   
> 
> Each guest may only use a part of the 10Gb/s bandwidth, if you have 10 
> guests each using 1Gb/s, then we should be able to support this without 
> multiqueue in the guests.

I would expect that there are people that even people with 10 simultaneous
guests would like to be able to saturate the link when only one or two of
them are doing much traffic on the interface.

Having the load spread evenly over all guests sounds like a much rarer
use case.

Arnd <><
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: copyless virtio net thoughts?

2009-02-06 Thread Avi Kivity

Herbert Xu wrote:

On Fri, Feb 06, 2009 at 10:46:37AM +0200, Avi Kivity wrote:
  
The guest's block layer is copyless.  The host block layer is -><- this  
far from being copyless -- all we need is preadv()/pwritev() or to  
replace our thread pool implementation in qemu with linux-aio.   
Everything else is copyless.


Since we are actively working on this, expect this limitation to  
disappear soon.



Great, when that happens I'll promise to revisit zero-copy transmit :)

  


I was hoping to get some concurrency here, but okay.

I support this, but it should be in addition to copylessness, not on its  
own.



I was talking about it in the context of zero-copy receive, where
you mentioned that the virtio/kvm copy may not occur on the CPU of
the guest's copy.

My point is that using multiqueue you can avoid this change of CPU.

But yeah I think zero-copy receive is much more useful than zero-
copy transmit at the moment.  Although I'd prefer to wait for
you guys to finish the block layer work before contemplating
pushing the copy on receive into the guest :)

  


We'll get the block layer done soon, so it won't be a barrier.


- many guests will not support multiqueue



Well, these guests will suck both on baremetal and in virtualisation,
big deal :) Multiqueue at 10GbE speeds and above is simply not an
optional feature.
  


Each guest may only use a part of the 10Gb/s bandwidth, if you have 10 
guests each using 1Gb/s, then we should be able to support this without 
multiqueue in the guests.



--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: copyless virtio net thoughts?

2009-02-06 Thread Herbert Xu
On Fri, Feb 06, 2009 at 10:46:37AM +0200, Avi Kivity wrote:
>
> The guest's block layer is copyless.  The host block layer is -><- this  
> far from being copyless -- all we need is preadv()/pwritev() or to  
> replace our thread pool implementation in qemu with linux-aio.   
> Everything else is copyless.
>
> Since we are actively working on this, expect this limitation to  
> disappear soon.

Great, when that happens I'll promise to revisit zero-copy transmit :)

> I support this, but it should be in addition to copylessness, not on its  
> own.

I was talking about it in the context of zero-copy receive, where
you mentioned that the virtio/kvm copy may not occur on the CPU of
the guest's copy.

My point is that using multiqueue you can avoid this change of CPU.

But yeah I think zero-copy receive is much more useful than zero-
copy transmit at the moment.  Although I'd prefer to wait for
you guys to finish the block layer work before contemplating
pushing the copy on receive into the guest :)

> - many guests will not support multiqueue

Well, these guests will suck both on baremetal and in virtualisation,
big deal :) Multiqueue at 10GbE speeds and above is simply not an
optional feature.

> - for some threaded workloads, you cannot predict where the final read()  
> will come from; this renders multiqueue ineffective for keeping cache  
> locality
>
> - usually you want virtio to transfer large amounts of data; but if you  
> want your copies to be cache-hot, you need to limit transfers to half  
> the cache size (a quarter if hyperthreading); this limits virtio  
> effectiveness

Agreed on both counts.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: copyless virtio net thoughts?

2009-02-06 Thread Avi Kivity

Herbert Xu wrote:

On Thu, Feb 05, 2009 at 02:37:07PM +0200, Avi Kivity wrote:
  

I believe that copyless networking is absolutely essential.



I used to think it was important, but I'm now of the opinion
that it's quite useless for virtualisation as it stands.

  
For transmit, copyless is needed to properly support sendfile() type  
workloads - http/ftp/nfs serving.  These are usually high-bandwidth,  
cache-cold workloads where a copy is most expensive.



This is totally true for baremetal, but useless for virtualisation
right now because the block layer is not zero-copy.  That is, the
data is going to be cache hot anyway so zero-copy networking doesn't
buy you much at all.
  


The guest's block layer is copyless.  The host block layer is -><- this 
far from being copyless -- all we need is preadv()/pwritev() or to 
replace our thread pool implementation in qemu with linux-aio.  
Everything else is copyless.


Since we are actively working on this, expect this limitation to 
disappear soon.


(even if it doesn't, the effect of block layer copies is multiplied by 
the cache miss percentage which can be quite low for many workloads; but 
again, we're not bulding on that)

Please also recall that for the time being, block speeds are
way slower than network speeds.  So the really interesting case
is actually network-to-network transfers.  Again due to the
RX copy this is going to be cache hot.
  


Block speeds are not way slower.  We're at 4Gb/sec for Fibre and 10Gb/s 
for networking.  With dual channels or a decent cache hit rate they're 
evenly matched.


For receive, the guest will almost always do an additional copy, but it  
will most likely do the copy from another cpu.  Xen netchannel2  



That's what we should strive to avoid.  The best scenario with
modern 10GbE NICs is to stay on one CPU if at all possible.  The
NIC will pick a CPU when it delivers the packet into one of the
RX queues and we should stick with it for as long as possible.

So what I'd like to see next in virtualised networking is virtual
multiqueue support in guest drivers.  No I'm not talking about
making one or more of the physical RX/TX queues available to the
guest (aka passthrough), but actually turning something like the
virtio-net interface into a multiqueue interface.
  


I support this, but it should be in addition to copylessness, not on its 
own.


- many guests will not support multiqueue
- for some threaded workloads, you cannot predict where the final read() 
will come from; this renders multiqueue ineffective for keeping cache 
locality
- usually you want virtio to transfer large amounts of data; but if you 
want your copies to be cache-hot, you need to limit transfers to half 
the cache size (a quarter if hyperthreading); this limits virtio 
effectiveness



--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: copyless virtio net thoughts?

2009-02-05 Thread Herbert Xu
On Thu, Feb 05, 2009 at 02:37:07PM +0200, Avi Kivity wrote:
>
> I believe that copyless networking is absolutely essential.

I used to think it was important, but I'm now of the opinion
that it's quite useless for virtualisation as it stands.

> For transmit, copyless is needed to properly support sendfile() type  
> workloads - http/ftp/nfs serving.  These are usually high-bandwidth,  
> cache-cold workloads where a copy is most expensive.

This is totally true for baremetal, but useless for virtualisation
right now because the block layer is not zero-copy.  That is, the
data is going to be cache hot anyway so zero-copy networking doesn't
buy you much at all.

Please also recall that for the time being, block speeds are
way slower than network speeds.  So the really interesting case
is actually network-to-network transfers.  Again due to the
RX copy this is going to be cache hot.

> For receive, the guest will almost always do an additional copy, but it  
> will most likely do the copy from another cpu.  Xen netchannel2  

That's what we should strive to avoid.  The best scenario with
modern 10GbE NICs is to stay on one CPU if at all possible.  The
NIC will pick a CPU when it delivers the packet into one of the
RX queues and we should stick with it for as long as possible.

So what I'd like to see next in virtualised networking is virtual
multiqueue support in guest drivers.  No I'm not talking about
making one or more of the physical RX/TX queues available to the
guest (aka passthrough), but actually turning something like the
virtio-net interface into a multiqueue interface.

This is the best way to get cache locality and minimise CPU waste.

So I'm certainly not rushing out to do any zero-copy virtual
networking.  However, I would like to start working on a virtual
multiqueue NIC interface.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: copyless virtio net thoughts?

2009-02-05 Thread Anthony Liguori

Avi Kivity wrote:

Chris Wright wrote:
There's been a number of different discussions re: getting copyless 
virtio

net (esp. for KVM).  This is just a poke in that general direction to
stir the discussion.  I'm interested to hear current thoughts


I believe that copyless networking is absolutely essential.

For transmit, copyless is needed to properly support sendfile() type 
workloads - http/ftp/nfs serving.  These are usually high-bandwidth, 
cache-cold workloads where a copy is most expensive.


For receive, the guest will almost always do an additional copy, but 
it will most likely do the copy from another cpu.  Xen netchannel2 
mitigates this somewhat by having the guest request the hypervisor to 
perform the copy when the rx interrupt is processed, but this may 
still be too early (the packet may be destined to a process that is on 
another vcpu), and the extra hypercall is expensive.


In my opinion, it would be ideal to linux-aio enable taps and packet 
sockets.  io_submit() allows submitting multiple buffers in one 
syscall and supports scatter/gather.  io_getevents() supports 
dequeuing multiple packet completions in one syscall.


splice() has some nice properties too.  It disconnects the notion of 
moving around packets from the actually copy them.   It also fits well 
into a more performant model of interguest IO.  You can't publish 
multiple buffers with splice but I don't think we can do that today 
practically speaking because of mergable RX buffers.  You would have to 
extend the linux-aio interface to hand it a bunch of buffers and for it 
to tell you where the packet boundaries were.


Regards,

Anthony Liguroi


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: copyless virtio net thoughts?

2009-02-05 Thread Avi Kivity

Chris Wright wrote:

There's been a number of different discussions re: getting copyless virtio
net (esp. for KVM).  This is just a poke in that general direction to
stir the discussion.  I'm interested to hear current thoughts


I believe that copyless networking is absolutely essential.

For transmit, copyless is needed to properly support sendfile() type 
workloads - http/ftp/nfs serving.  These are usually high-bandwidth, 
cache-cold workloads where a copy is most expensive.


For receive, the guest will almost always do an additional copy, but it 
will most likely do the copy from another cpu.  Xen netchannel2 
mitigates this somewhat by having the guest request the hypervisor to 
perform the copy when the rx interrupt is processed, but this may still 
be too early (the packet may be destined to a process that is on another 
vcpu), and the extra hypercall is expensive.


In my opinion, it would be ideal to linux-aio enable taps and packet 
sockets.  io_submit() allows submitting multiple buffers in one syscall 
and supports scatter/gather.  io_getevents() supports dequeuing multiple 
packet completions in one syscall.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


copyless virtio net thoughts?

2009-02-04 Thread Chris Wright
There's been a number of different discussions re: getting copyless virtio
net (esp. for KVM).  This is just a poke in that general direction to
stir the discussion.  I'm interested to hear current thoughts?
 
thanks
-chris
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html