Re: [RFC][PATCH v4 00/18] Provide a zero-copy method on KVM virtio-net.

2010-05-09 Thread Michael S. Tsirkin
On Sat, May 08, 2010 at 03:55:48PM +0800, Xin, Xiaohui wrote:
 Michael,
 Sorry, somehow I missed this mail. :-(
 
  Here, we have ever considered 2 ways to utilize the page constructor
  API to dispense the user buffers.
  
  One:   Modify __alloc_skb() function a bit, it can only allocate a 
 structure of sk_buff, and the data pointer is pointing to a 
 user buffer which is coming from a page constructor API.
 Then the shinfo of the skb is also from guest.
 When packet is received from hardware, the skb-data is filled
 directly by h/w. What we have done is in this way.
  
 Pros:   We can avoid any copy here.
 Cons:   Guest virtio-net driver needs to allocate skb as almost
 the same method with the host NIC drivers, say the size
 of netdev_alloc_skb() and the same reserved space in the
 head of skb. Many NIC drivers are the same with guest and
 ok for this. But some lastest NIC drivers reserves special
 room in skb head. To deal with it, we suggest to provide
 a method in guest virtio-net driver to ask for parameter
 we interest from the NIC driver when we know which device 
 we have bind to do zero-copy. Then we ask guest to do so.
 Is that reasonable?
 
 Do you still do this?
 
 Currently, we still use the first way. But we now ignore the room which 
 host skb_reserve() required when device is doing zero-copy. Now we don't 
 taint guest virtio-net driver with a new method by this way.
 
  Two:   Modify driver to get user buffer allocated from a page 
  constructor
 API(to substitute alloc_page()), the user buffer are used as payload
 buffers and filled by h/w directly when packet is received. Driver
 should associate the pages with skb (skb_shinfo(skb)-frags). For 
 the head buffer side, let host allocates skb, and h/w fills it. 
 After that, the data filled in host skb header will be copied into
 guest header buffer which is submitted together with the payload buffer.
  
 Pros:   We could less care the way how guest or host allocates their
 buffers.
 Cons:   We still need a bit copy here for the skb header.
  
  We are not sure which way is the better here. This is the first thing we 
  want
  to get comments from the community. We wish the modification to the network
  part will be generic which not used by vhost-net backend only, but a user
  application may use it as well when the zero-copy device may provides async
  read/write operations later.
 
 I commented on this in the past. Do you still want comments?
 
 Now we continue with the first way and try to push it. But any comments about 
 the two methods are still welcome.
 
 That's nice. The thing to do is probably to enable GSO/TSO
 and see what we get this way. Also, mergeable buffer support
 was recently posted and I hope to merge it for 2.6.35.
 You might want to take a look.
 
 I'm looking at the mergeable buffer. I think GSO/GRO support with zero-copy 
 also needs it.
 Currently, GSO/TSO is still not supported by vhost-net?

GSO/TSO are currently supported with tap and macvtap,
AF_PACKET socket backend still needs some work to
enable GSO.

 -- 
 MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC][PATCH v4 00/18] Provide a zero-copy method on KVM virtio-net.

2010-05-08 Thread Xin, Xiaohui
Michael,
Sorry, somehow I missed this mail. :-(

 Here, we have ever considered 2 ways to utilize the page constructor
 API to dispense the user buffers.
 
 One: Modify __alloc_skb() function a bit, it can only allocate a 
  structure of sk_buff, and the data pointer is pointing to a 
  user buffer which is coming from a page constructor API.
  Then the shinfo of the skb is also from guest.
  When packet is received from hardware, the skb-data is filled
  directly by h/w. What we have done is in this way.
 
  Pros:   We can avoid any copy here.
  Cons:   Guest virtio-net driver needs to allocate skb as almost
  the same method with the host NIC drivers, say the size
  of netdev_alloc_skb() and the same reserved space in the
  head of skb. Many NIC drivers are the same with guest and
  ok for this. But some lastest NIC drivers reserves special
  room in skb head. To deal with it, we suggest to provide
  a method in guest virtio-net driver to ask for parameter
  we interest from the NIC driver when we know which device 
  we have bind to do zero-copy. Then we ask guest to do so.
  Is that reasonable?

Do you still do this?

Currently, we still use the first way. But we now ignore the room which 
host skb_reserve() required when device is doing zero-copy. Now we don't 
taint guest virtio-net driver with a new method by this way.

 Two: Modify driver to get user buffer allocated from a page constructor
  API(to substitute alloc_page()), the user buffer are used as payload
  buffers and filled by h/w directly when packet is received. Driver
  should associate the pages with skb (skb_shinfo(skb)-frags). For 
  the head buffer side, let host allocates skb, and h/w fills it. 
  After that, the data filled in host skb header will be copied into
  guest header buffer which is submitted together with the payload buffer.
 
  Pros:   We could less care the way how guest or host allocates their
  buffers.
  Cons:   We still need a bit copy here for the skb header.
 
 We are not sure which way is the better here. This is the first thing we want
 to get comments from the community. We wish the modification to the network
 part will be generic which not used by vhost-net backend only, but a user
 application may use it as well when the zero-copy device may provides async
 read/write operations later.

I commented on this in the past. Do you still want comments?

Now we continue with the first way and try to push it. But any comments about 
the two methods are still welcome.

That's nice. The thing to do is probably to enable GSO/TSO
and see what we get this way. Also, mergeable buffer support
was recently posted and I hope to merge it for 2.6.35.
You might want to take a look.

I'm looking at the mergeable buffer. I think GSO/GRO support with zero-copy 
also needs it.
Currently, GSO/TSO is still not supported by vhost-net?
-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC][PATCH v4 00/18] Provide a zero-copy method on KVM virtio-net.

2010-04-28 Thread Xin, Xiaohui
  The idea is simple, just to pin the guest VM user space and then let
  host NIC driver has the chance to directly DMA to it.
 
 Isn't it much easier to map the RX ring of the network device into the
 guest's address space, have DMA map calls translate guest addresses to
 physical/DMA addresses as well as do all of this crazy page pinning
 stuff, and provide the translations and protections via the IOMMU?

This means we need guest know how the specific network device works.
So we won't be able to, for example, move guest between different hosts.
There are other problems: many physical systems do not have an iommu,
some guest OS-es do not support DMA map calls, doing VM exit
on each DMA map call might turn out to be very slow. And so on.

This solution is what now we can think of to implement zero-copy. Some
modifications are made to net core to try to avoid network device driver
changes.  The major change is to __alloc_skb(), in which we added a dev
parameter to indicate whether the device will DMA to/from guest/user buffer
which is pointed by host skb-data. We also modify skb_release_data() and
skb_reserve(). We made it now works with ixgbe driver with PS mode disabled,
and got some performance data with it.
  
using netperf with GSO/TSO disabled, 10G NIC, 
disabled packet split mode, with raw socket case compared to vhost.

bindwidth will be from 1.1Gbps to 1.7Gbps
CPU % from 120%-140% to 140%-160%

We are now trying to get decent performance data with advanced features.
Do you have any other concerns with this solution? 

 What's being proposed here looks a bit over-engineered.

This is an attempt to reduce overhead for virtio (paravirtualization).
'Don't use PV' is kind of an alternative, but I do not
think it's a simpler one.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH v4 00/18] Provide a zero-copy method on KVM virtio-net.

2010-04-25 Thread xiaohui . xin
We provide an zero-copy method which driver side may get external
buffers to DMA. Here external means driver don't use kernel space
to allocate skb buffers. Currently the external buffer can be from
guest virtio-net driver.

The idea is simple, just to pin the guest VM user space and then
let host NIC driver has the chance to directly DMA to it. 
The patches are based on vhost-net backend driver. We add a device
which provides proto_ops as sendmsg/recvmsg to vhost-net to
send/recv directly to/from the NIC driver. KVM guest who use the
vhost-net backend may bind any ethX interface in the host side to
get copyless data transfer thru guest virtio-net frontend.

patch 01-12:net core changes.
patch 13-17:new device as interface to mantpulate external buffers.
patch 18:   for vhost-net.

The guest virtio-net driver submits multiple requests thru vhost-net
backend driver to the kernel. And the requests are queued and then
completed after corresponding actions in h/w are done.

For read, user space buffers are dispensed to NIC driver for rx when
a page constructor API is invoked. Means NICs can allocate user buffers
from a page constructor. We add a hook in netif_receive_skb() function
to intercept the incoming packets, and notify the zero-copy device.

For write, the zero-copy deivce may allocates a new host skb and puts
payload on the skb_shinfo(skb)-frags, and copied the header to skb-data.
The request remains pending until the skb is transmitted by h/w.

Here, we have ever considered 2 ways to utilize the page constructor
API to dispense the user buffers.

One:Modify __alloc_skb() function a bit, it can only allocate a 
structure of sk_buff, and the data pointer is pointing to a 
user buffer which is coming from a page constructor API.
Then the shinfo of the skb is also from guest.
When packet is received from hardware, the skb-data is filled
directly by h/w. What we have done is in this way.

Pros:   We can avoid any copy here.
Cons:   Guest virtio-net driver needs to allocate skb as almost
the same method with the host NIC drivers, say the size
of netdev_alloc_skb() and the same reserved space in the
head of skb. Many NIC drivers are the same with guest and
ok for this. But some lastest NIC drivers reserves special
room in skb head. To deal with it, we suggest to provide
a method in guest virtio-net driver to ask for parameter
we interest from the NIC driver when we know which device 
we have bind to do zero-copy. Then we ask guest to do so.
Is that reasonable?

Two:Modify driver to get user buffer allocated from a page constructor
API(to substitute alloc_page()), the user buffer are used as payload
buffers and filled by h/w directly when packet is received. Driver
should associate the pages with skb (skb_shinfo(skb)-frags). For 
the head buffer side, let host allocates skb, and h/w fills it. 
After that, the data filled in host skb header will be copied into
guest header buffer which is submitted together with the payload buffer.

Pros:   We could less care the way how guest or host allocates their
buffers.
Cons:   We still need a bit copy here for the skb header.

We are not sure which way is the better here. This is the first thing we want
to get comments from the community. We wish the modification to the network
part will be generic which not used by vhost-net backend only, but a user
application may use it as well when the zero-copy device may provides async
read/write operations later.

Please give comments especially for the network part modifications.


We provide multiple submits and asynchronous notifiicaton to 
vhost-net too.

Our goal is to improve the bandwidth and reduce the CPU usage.
Exact performance data will be provided later. But for simple
test with netperf, we found bindwidth up and CPU % up too,
but the bindwidth up ratio is much more than CPU % up ratio.

What we have not done yet:
packet split support
To support GRO
Performance tuning

what we have done in v1:
polish the RCU usage
deal with write logging in asynchroush mode in vhost
add notifier block for mp device
rename page_ctor to mp_port in netdevice.h to make it looks generic
add mp_dev_change_flags() for mp device to change NIC state
add CONIFG_VHOST_MPASSTHRU to limit the usage when module is not load
a small fix for missing dev_put when fail
using dynamic minor instead of static minor number
a __KERNEL__ protect to mp_get_sock()

what we have done in v2:

remove most of the RCU usage, since the ctor pointer is only
changed by BIND/UNBIND ioctl, and during that time, NIC will be
stopped to get good 

Re: [RFC][PATCH v4 00/18] Provide a zero-copy method on KVM virtio-net.

2010-04-25 Thread David Miller
From: xiaohui@intel.com
Date: Sun, 25 Apr 2010 17:20:06 +0800

 The idea is simple, just to pin the guest VM user space and then let
 host NIC driver has the chance to directly DMA to it.

Isn't it much easier to map the RX ring of the network device into the
guest's address space, have DMA map calls translate guest addresses to
physical/DMA addresses as well as do all of this crazy page pinning
stuff, and provide the translations and protections via the IOMMU?

What's being proposed here looks a bit over-engineered.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH v4 00/18] Provide a zero-copy method on KVM virtio-net.

2010-04-25 Thread Michael S. Tsirkin
On Sun, Apr 25, 2010 at 02:55:29AM -0700, David Miller wrote:
 From: xiaohui@intel.com
 Date: Sun, 25 Apr 2010 17:20:06 +0800
 
  The idea is simple, just to pin the guest VM user space and then let
  host NIC driver has the chance to directly DMA to it.
 
 Isn't it much easier to map the RX ring of the network device into the
 guest's address space, have DMA map calls translate guest addresses to
 physical/DMA addresses as well as do all of this crazy page pinning
 stuff, and provide the translations and protections via the IOMMU?

This means we need guest know how the specific network device works.
So we won't be able to, for example, move guest between different hosts.
There are other problems: many physical systems do not have an iommu,
some guest OS-es do not support DMA map calls, doing VM exit
on each DMA map call might turn out to be very slow. And so on.

 What's being proposed here looks a bit over-engineered.

This is an attempt to reduce overhead for virtio (paravirtualization).
'Don't use PV' is kind of an alternative, but I do not
think it's a simpler one.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH v4 00/18] Provide a zero-copy method on KVM virtio-net.

2010-04-25 Thread Michael S. Tsirkin
On Sun, Apr 25, 2010 at 05:20:06PM +0800, xiaohui@intel.com wrote:
 We provide an zero-copy method which driver side may get external
 buffers to DMA. Here external means driver don't use kernel space
 to allocate skb buffers. Currently the external buffer can be from
 guest virtio-net driver.
 
 The idea is simple, just to pin the guest VM user space and then
 let host NIC driver has the chance to directly DMA to it. 
 The patches are based on vhost-net backend driver. We add a device
 which provides proto_ops as sendmsg/recvmsg to vhost-net to
 send/recv directly to/from the NIC driver. KVM guest who use the
 vhost-net backend may bind any ethX interface in the host side to
 get copyless data transfer thru guest virtio-net frontend.
 
 patch 01-12:  net core changes.
 patch 13-17:  new device as interface to mantpulate external buffers.
 patch 18: for vhost-net.
 
 The guest virtio-net driver submits multiple requests thru vhost-net
 backend driver to the kernel. And the requests are queued and then
 completed after corresponding actions in h/w are done.
 
 For read, user space buffers are dispensed to NIC driver for rx when
 a page constructor API is invoked. Means NICs can allocate user buffers
 from a page constructor. We add a hook in netif_receive_skb() function
 to intercept the incoming packets, and notify the zero-copy device.
 
 For write, the zero-copy deivce may allocates a new host skb and puts
 payload on the skb_shinfo(skb)-frags, and copied the header to skb-data.
 The request remains pending until the skb is transmitted by h/w.
 
 Here, we have ever considered 2 ways to utilize the page constructor
 API to dispense the user buffers.
 
 One:  Modify __alloc_skb() function a bit, it can only allocate a 
   structure of sk_buff, and the data pointer is pointing to a 
   user buffer which is coming from a page constructor API.
   Then the shinfo of the skb is also from guest.
   When packet is received from hardware, the skb-data is filled
   directly by h/w. What we have done is in this way.
 
   Pros:   We can avoid any copy here.
   Cons:   Guest virtio-net driver needs to allocate skb as almost
   the same method with the host NIC drivers, say the size
   of netdev_alloc_skb() and the same reserved space in the
   head of skb. Many NIC drivers are the same with guest and
   ok for this. But some lastest NIC drivers reserves special
   room in skb head. To deal with it, we suggest to provide
   a method in guest virtio-net driver to ask for parameter
   we interest from the NIC driver when we know which device 
   we have bind to do zero-copy. Then we ask guest to do so.
   Is that reasonable?

Do you still do this?

 Two:  Modify driver to get user buffer allocated from a page constructor
   API(to substitute alloc_page()), the user buffer are used as payload
   buffers and filled by h/w directly when packet is received. Driver
   should associate the pages with skb (skb_shinfo(skb)-frags). For 
   the head buffer side, let host allocates skb, and h/w fills it. 
   After that, the data filled in host skb header will be copied into
   guest header buffer which is submitted together with the payload buffer.
 
   Pros:   We could less care the way how guest or host allocates their
   buffers.
   Cons:   We still need a bit copy here for the skb header.
 
 We are not sure which way is the better here. This is the first thing we want
 to get comments from the community. We wish the modification to the network
 part will be generic which not used by vhost-net backend only, but a user
 application may use it as well when the zero-copy device may provides async
 read/write operations later.

I commented on this in the past. Do you still want comments?

 Please give comments especially for the network part modifications.
 
 
 We provide multiple submits and asynchronous notifiicaton to 
 vhost-net too.
 
 Our goal is to improve the bandwidth and reduce the CPU usage.
 Exact performance data will be provided later. But for simple
 test with netperf, we found bindwidth up and CPU % up too,
 but the bindwidth up ratio is much more than CPU % up ratio.
 
 What we have not done yet:
   packet split support
   To support GRO
   Performance tuning
 
 what we have done in v1:
   polish the RCU usage
   deal with write logging in asynchroush mode in vhost
   add notifier block for mp device
   rename page_ctor to mp_port in netdevice.h to make it looks generic
   add mp_dev_change_flags() for mp device to change NIC state
   add CONIFG_VHOST_MPASSTHRU to limit the usage when module is not load
   a small fix for missing dev_put when fail
   using dynamic minor instead of static minor number
   a __KERNEL__ protect to mp_get_sock()
 
 what we have done in v2: