RE: [RFC][PATCH v2 0/3] Provide a zero-copy method on KVM virtio-net.

2010-04-22 Thread Xin, Xiaohui
Michael,

Yes, I think this packet split mode probably maps well to mergeable buffer
support. Note that
1. Not all devices support large packets in this way, others might map
   to indirect buffers better

Do the indirect buffers accord to deal with the skb-frag_list?

   So we have to figure out how migration is going to work
Yes, different guest virtio-net driver may contain different features.
Does the qemu migration work with different features supported by virtio-net
driver now?

2. It's up to guest driver whether to enable features such as
   mergeable buffers and indirect buffers
   So we have to figure out how to notify guest which mode
   is optimal for a given device
Yes. When a device is binded, the mp device may query the capabilities from 
driver.
Actually, there is a structure now in mp device can do this, we can add some 
field
to support more.

3. We don't want to depend on jumbo frames for decent performance
   So we probably should support GSO/GRO
GSO is for the tx side, right? I think driver can handle it itself.
For GRO, I'm not sure it's easy or not. Basically, the mp device now
we have support is doing what raw socket is doing. The packets are not going to 
host stack.
-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH v2 0/3] Provide a zero-copy method on KVM virtio-net.

2010-04-22 Thread Michael S. Tsirkin
On Thu, Apr 22, 2010 at 04:57:56PM +0800, Xin, Xiaohui wrote:
 Michael,
 
 Yes, I think this packet split mode probably maps well to mergeable buffer
 support. Note that
 1. Not all devices support large packets in this way, others might map
to indirect buffers better
 
 Do the indirect buffers accord to deal with the skb-frag_list?

We currently use skb-frags.

So we have to figure out how migration is going to work
 Yes, different guest virtio-net driver may contain different features.
 Does the qemu migration work with different features supported by virtio-net
 driver now?

For now, you must have identical feature-sets for migration to work.
And long as we manage the buffers in software, we can always make
features match.

 2. It's up to guest driver whether to enable features such as
mergeable buffers and indirect buffers
So we have to figure out how to notify guest which mode
is optimal for a given device
 Yes. When a device is binded, the mp device may query the capabilities from 
 driver.
 Actually, there is a structure now in mp device can do this, we can add some 
 field
 to support more.
 
 3. We don't want to depend on jumbo frames for decent performance
So we probably should support GSO/GRO
 GSO is for the tx side, right? I think driver can handle it itself.
 For GRO, I'm not sure it's easy or not. Basically, the mp device now
 we have support is doing what raw socket is doing. The packets are not going 
 to host stack.

See commit bfd5f4a3d605e0f6054df0b59fe0907ff7e696d3
(it doesn't currently work with vhost net, but that's
 a separate story).

 -- 
 MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH v2 0/3] Provide a zero-copy method on KVM virtio-net.

2010-04-21 Thread Michael S. Tsirkin
On Tue, Apr 20, 2010 at 10:21:55AM +0800, Xin, Xiaohui wrote:
 Michael,
 
  What we have not done yet:
 packet split support
  
 What does this mean, exactly?
  We can support 1500MTU, but for jumbo frame, since vhost driver before 
  don't 
 support mergeable buffer, we cannot try it for multiple sg.
  
 I do not see why, vhost currently supports 64K buffers with indirect
 descriptors.
  
  The receive_skb() in guest virtio-net driver will merge the multiple sg to 
  skb frags, how can indirect descriptors to that?
 
 See add_recvbuf_big.
 
 I don't mean this, it's for buffer submission. I mean when packet is 
 received, in receive_buf(), mergeable buffer knows which pages received can 
 be hooked in skb frags, it's receive_mergeable() which do this.
 
 When a NIC driver supports packet split mode, then each ring descriptor 
 contains a skb and a page. When packet is received, if the status is not EOP, 
 then hook the page of the next descriptor to the prev skb. We don't how many 
 frags belongs to one skb. So when guest submit buffers, it should submit 
 multiple pages, and when receive, the guest should know which pages are 
 belongs to one skb and hook them together. I think receive_mergeable() can do 
 this, but I don't see how big-packets handle this. May I miss something here?
 
 Thanks
 Xiaohui 


Yes, I think this packet split mode probably maps well to mergeable buffer
support. Note that
1. Not all devices support large packets in this way, others might map
   to indirect buffers better
   So we have to figure out how migration is going to work
2. It's up to guest driver whether to enable features such as
   mergeable buffers and indirect buffers
   So we have to figure out how to notify guest which mode
   is optimal for a given device
3. We don't want to depend on jumbo frames for decent performance
   So we probably should support GSO/GRO

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC][PATCH v2 0/3] Provide a zero-copy method on KVM virtio-net.

2010-04-19 Thread Xin, Xiaohui
 Michael,
  The idea is simple, just to pin the guest VM user space and then
  let host NIC driver has the chance to directly DMA to it. 
  The patches are based on vhost-net backend driver. We add a device
  which provides proto_ops as sendmsg/recvmsg to vhost-net to
  send/recv directly to/from the NIC driver. KVM guest who use the
  vhost-net backend may bind any ethX interface in the host side to
  get copyless data transfer thru guest virtio-net frontend.
  
  The scenario is like this:
  
  The guest virtio-net driver submits multiple requests thru vhost-net
  backend driver to the kernel. And the requests are queued and then
  completed after corresponding actions in h/w are done.
  
  For read, user space buffers are dispensed to NIC driver for rx when
  a page constructor API is invoked. Means NICs can allocate user buffers
  from a page constructor. We add a hook in netif_receive_skb() function
  to intercept the incoming packets, and notify the zero-copy device.
  
  For write, the zero-copy deivce may allocates a new host skb and puts
  payload on the skb_shinfo(skb)-frags, and copied the header to skb-data.
  The request remains pending until the skb is transmitted by h/w.
  
  Here, we have ever considered 2 ways to utilize the page constructor
  API to dispense the user buffers.
  
  One:  Modify __alloc_skb() function a bit, it can only allocate a 
structure of sk_buff, and the data pointer is pointing to a 
user buffer which is coming from a page constructor API.
Then the shinfo of the skb is also from guest.
When packet is received from hardware, the skb-data is filled
directly by h/w. What we have done is in this way.
  
Pros:   We can avoid any copy here.
Cons:   Guest virtio-net driver needs to allocate skb as almost
the same method with the host NIC drivers, say the size
of netdev_alloc_skb() and the same reserved space in the
head of skb. Many NIC drivers are the same with guest and
ok for this. But some lastest NIC drivers reserves special
room in skb head. To deal with it, we suggest to provide
a method in guest virtio-net driver to ask for parameter
we interest from the NIC driver when we know which device 
we have bind to do zero-copy. Then we ask guest to do so.
Is that reasonable?
 Unfortunately, this would break compatibility with existing virtio.
 This also complicates migration.  
 You mean any modification to the guest virtio-net driver will break the
 compatibility? We tried to enlarge the virtio_net_config to contains the
 2 parameter, and add one VIRTIO_NET_F_PASSTHRU flag, virtionet_probe()
 will check the feature flag, and get the parameters, then virtio-net driver 
 use
 it to allocate buffers. How about this?

This means that we can't, for example, live-migrate between different systems
without flushing outstanding buffers.

Ok. What we have thought about now is to do something with skb_reserve().
If the device is binded by mp, then skb_reserve() will do nothing with it.

 What is the room in skb head used for?
 I'm not sure, but the latest ixgbe driver does this, it reserves 32 bytes 
 compared to
 NET_IP_ALIGN.

Looking at code, this seems to do with alignment - could just be
a performance optimization.

  Two:  Modify driver to get user buffer allocated from a page 
  constructor
API(to substitute alloc_page()), the user buffer are used as payload
buffers and filled by h/w directly when packet is received. Driver
should associate the pages with skb (skb_shinfo(skb)-frags). For 
the head buffer side, let host allocates skb, and h/w fills it. 
After that, the data filled in host skb header will be copied into
guest header buffer which is submitted together with the payload buffer.
  
Pros:   We could less care the way how guest or host allocates their
buffers.
Cons:   We still need a bit copy here for the skb header.
  
  We are not sure which way is the better here. 
 The obvious question would be whether you see any speed difference
 with the two approaches. If no, then the second approach would be
 better.
 
 I remember the second approach is a bit slower in 1500MTU. 
 But we did not tested too much.

Well, that's an important datapoint. By the way, you'll need
header copy to activate LRO in host, so that's a good
reason to go with option 2 as well.


  This is the first thing we want
  to get comments from the community. We wish the modification to the 
  network
  part will be generic which not used by vhost-net backend only, but a user
  application may use it as well when the zero-copy device may provides 
  async
  read/write operations later.
  
  Please give comments especially for the network part modifications.
  
  
  We provide multiple submits and asynchronous notifiicaton to 
 vhost-net too.
  
  Our goal is to improve the bandwidth and reduce the CPU usage.
  

Re: [RFC][PATCH v2 0/3] Provide a zero-copy method on KVM virtio-net.

2010-04-19 Thread Michael S. Tsirkin
On Mon, Apr 19, 2010 at 06:05:17PM +0800, Xin, Xiaohui wrote:
  Michael,
   The idea is simple, just to pin the guest VM user space and then
   let host NIC driver has the chance to directly DMA to it. 
   The patches are based on vhost-net backend driver. We add a device
   which provides proto_ops as sendmsg/recvmsg to vhost-net to
   send/recv directly to/from the NIC driver. KVM guest who use the
   vhost-net backend may bind any ethX interface in the host side to
   get copyless data transfer thru guest virtio-net frontend.
   
   The scenario is like this:
   
   The guest virtio-net driver submits multiple requests thru vhost-net
   backend driver to the kernel. And the requests are queued and then
   completed after corresponding actions in h/w are done.
   
   For read, user space buffers are dispensed to NIC driver for rx when
   a page constructor API is invoked. Means NICs can allocate user buffers
   from a page constructor. We add a hook in netif_receive_skb() function
   to intercept the incoming packets, and notify the zero-copy device.
   
   For write, the zero-copy deivce may allocates a new host skb and puts
   payload on the skb_shinfo(skb)-frags, and copied the header to 
   skb-data.
   The request remains pending until the skb is transmitted by h/w.
   
   Here, we have ever considered 2 ways to utilize the page constructor
   API to dispense the user buffers.
   
   One:Modify __alloc_skb() function a bit, it can only allocate a 
   structure of sk_buff, and the data pointer is pointing to a 
   user buffer which is coming from a page constructor API.
   Then the shinfo of the skb is also from guest.
   When packet is received from hardware, the skb-data is filled
   directly by h/w. What we have done is in this way.
   
   Pros:   We can avoid any copy here.
   Cons:   Guest virtio-net driver needs to allocate skb as almost
   the same method with the host NIC drivers, say the size
   of netdev_alloc_skb() and the same reserved space in the
   head of skb. Many NIC drivers are the same with guest 
   and
   ok for this. But some lastest NIC drivers reserves 
   special
   room in skb head. To deal with it, we suggest to provide
   a method in guest virtio-net driver to ask for parameter
   we interest from the NIC driver when we know which 
   device 
   we have bind to do zero-copy. Then we ask guest to do 
   so.
   Is that reasonable?
  Unfortunately, this would break compatibility with existing virtio.
  This also complicates migration.  
  You mean any modification to the guest virtio-net driver will break the
  compatibility? We tried to enlarge the virtio_net_config to contains the
  2 parameter, and add one VIRTIO_NET_F_PASSTHRU flag, virtionet_probe()
  will check the feature flag, and get the parameters, then virtio-net 
  driver use
  it to allocate buffers. How about this?
 
 This means that we can't, for example, live-migrate between different systems
 without flushing outstanding buffers.
 
 Ok. What we have thought about now is to do something with skb_reserve().
 If the device is binded by mp, then skb_reserve() will do nothing with it.
 
  What is the room in skb head used for?
  I'm not sure, but the latest ixgbe driver does this, it reserves 32 bytes 
  compared to
  NET_IP_ALIGN.
 
 Looking at code, this seems to do with alignment - could just be
 a performance optimization.
 
   Two:Modify driver to get user buffer allocated from a page 
   constructor
   API(to substitute alloc_page()), the user buffer are used as 
   payload
   buffers and filled by h/w directly when packet is received. 
   Driver
   should associate the pages with skb (skb_shinfo(skb)-frags). 
   For 
   the head buffer side, let host allocates skb, and h/w fills it. 
   After that, the data filled in host skb header will be copied 
   into
   guest header buffer which is submitted together with the 
   payload buffer.
   
   Pros:   We could less care the way how guest or host allocates 
   their
   buffers.
   Cons:   We still need a bit copy here for the skb header.
   
   We are not sure which way is the better here. 
  The obvious question would be whether you see any speed difference
  with the two approaches. If no, then the second approach would be
  better.
  
  I remember the second approach is a bit slower in 1500MTU. 
  But we did not tested too much.
 
 Well, that's an important datapoint. By the way, you'll need
 header copy to activate LRO in host, so that's a good
 reason to go with option 2 as well.
 
 
   This is the first thing we want
   to get comments from the community. We wish the modification to the 
   network
   part will be generic which not used by vhost-net backend only, but a 

RE: [RFC][PATCH v2 0/3] Provide a zero-copy method on KVM virtio-net.

2010-04-19 Thread Xin, Xiaohui
Michael,

 What we have not done yet:
  packet split support
 
What does this mean, exactly?
 We can support 1500MTU, but for jumbo frame, since vhost driver before 
 don't 
support mergeable buffer, we cannot try it for multiple sg.
 
I do not see why, vhost currently supports 64K buffers with indirect
descriptors.
 
 The receive_skb() in guest virtio-net driver will merge the multiple sg to 
 skb frags, how can indirect descriptors to that?

See add_recvbuf_big.

I don't mean this, it's for buffer submission. I mean when packet is received, 
in receive_buf(), mergeable buffer knows which pages received can be hooked in 
skb frags, it's receive_mergeable() which do this.

When a NIC driver supports packet split mode, then each ring descriptor 
contains a skb and a page. When packet is received, if the status is not EOP, 
then hook the page of the next descriptor to the prev skb. We don't how many 
frags belongs to one skb. So when guest submit buffers, it should submit 
multiple pages, and when receive, the guest should know which pages are belongs 
to one skb and hook them together. I think receive_mergeable() can do this, but 
I don't see how big-packets handle this. May I miss something here?

Thanks
Xiaohui 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC][PATCH v2 0/3] Provide a zero-copy method on KVM virtio-net.

2010-04-15 Thread Xin, Xiaohui

Michael,
 The idea is simple, just to pin the guest VM user space and then
 let host NIC driver has the chance to directly DMA to it. 
 The patches are based on vhost-net backend driver. We add a device
 which provides proto_ops as sendmsg/recvmsg to vhost-net to
 send/recv directly to/from the NIC driver. KVM guest who use the
 vhost-net backend may bind any ethX interface in the host side to
 get copyless data transfer thru guest virtio-net frontend.
 
 The scenario is like this:
 
 The guest virtio-net driver submits multiple requests thru vhost-net
 backend driver to the kernel. And the requests are queued and then
 completed after corresponding actions in h/w are done.
 
 For read, user space buffers are dispensed to NIC driver for rx when
 a page constructor API is invoked. Means NICs can allocate user buffers
 from a page constructor. We add a hook in netif_receive_skb() function
 to intercept the incoming packets, and notify the zero-copy device.
 
 For write, the zero-copy deivce may allocates a new host skb and puts
 payload on the skb_shinfo(skb)-frags, and copied the header to skb-data.
 The request remains pending until the skb is transmitted by h/w.
 
 Here, we have ever considered 2 ways to utilize the page constructor
 API to dispense the user buffers.
 
 One: Modify __alloc_skb() function a bit, it can only allocate a 
  structure of sk_buff, and the data pointer is pointing to a 
  user buffer which is coming from a page constructor API.
  Then the shinfo of the skb is also from guest.
  When packet is received from hardware, the skb-data is filled
  directly by h/w. What we have done is in this way.
 
  Pros:   We can avoid any copy here.
  Cons:   Guest virtio-net driver needs to allocate skb as almost
  the same method with the host NIC drivers, say the size
  of netdev_alloc_skb() and the same reserved space in the
  head of skb. Many NIC drivers are the same with guest and
  ok for this. But some lastest NIC drivers reserves special
  room in skb head. To deal with it, we suggest to provide
  a method in guest virtio-net driver to ask for parameter
  we interest from the NIC driver when we know which device 
  we have bind to do zero-copy. Then we ask guest to do so.
  Is that reasonable?

Unfortunately, this would break compatibility with existing virtio.
This also complicates migration. 

You mean any modification to the guest virtio-net driver will break the
compatibility? We tried to enlarge the virtio_net_config to contains the
2 parameter, and add one VIRTIO_NET_F_PASSTHRU flag, virtionet_probe()
will check the feature flag, and get the parameters, then virtio-net driver use
it to allocate buffers. How about this?

What is the room in skb head used for?
I'm not sure, but the latest ixgbe driver does this, it reserves 32 bytes 
compared to
NET_IP_ALIGN.

 Two: Modify driver to get user buffer allocated from a page constructor
  API(to substitute alloc_page()), the user buffer are used as payload
  buffers and filled by h/w directly when packet is received. Driver
  should associate the pages with skb (skb_shinfo(skb)-frags). For 
  the head buffer side, let host allocates skb, and h/w fills it. 
  After that, the data filled in host skb header will be copied into
  guest header buffer which is submitted together with the payload buffer.
 
  Pros:   We could less care the way how guest or host allocates their
  buffers.
  Cons:   We still need a bit copy here for the skb header.
 
 We are not sure which way is the better here.

The obvious question would be whether you see any speed difference
with the two approaches. If no, then the second approach would be
better.

I remember the second approach is a bit slower in 1500MTU. 
But we did not tested too much.

 This is the first thing we want
 to get comments from the community. We wish the modification to the network
 part will be generic which not used by vhost-net backend only, but a user
 application may use it as well when the zero-copy device may provides async
 read/write operations later.
 
 Please give comments especially for the network part modifications.
 
 
 We provide multiple submits and asynchronous notifiicaton to 
vhost-net too.
 
 Our goal is to improve the bandwidth and reduce the CPU usage.
 Exact performance data will be provided later. But for simple
 test with netperf, we found bindwidth up and CPU % up too,
 but the bindwidth up ratio is much more than CPU % up ratio.
 
 What we have not done yet:
  packet split support

What does this mean, exactly?
We can support 1500MTU, but for jumbo frame, since vhost driver before don't 
support mergeable buffer, we cannot try it for multiple sg. A jumbo frame will 
split 5
frags and hook them once a descriptor, so the user buffer allocation is greatly 
dependent
on how guest 

Re: [RFC][PATCH v2 0/3] Provide a zero-copy method on KVM virtio-net.

2010-04-15 Thread Michael S. Tsirkin
On Thu, Apr 15, 2010 at 05:36:07PM +0800, Xin, Xiaohui wrote:
 
 Michael,
  The idea is simple, just to pin the guest VM user space and then
  let host NIC driver has the chance to directly DMA to it. 
  The patches are based on vhost-net backend driver. We add a device
  which provides proto_ops as sendmsg/recvmsg to vhost-net to
  send/recv directly to/from the NIC driver. KVM guest who use the
  vhost-net backend may bind any ethX interface in the host side to
  get copyless data transfer thru guest virtio-net frontend.
  
  The scenario is like this:
  
  The guest virtio-net driver submits multiple requests thru vhost-net
  backend driver to the kernel. And the requests are queued and then
  completed after corresponding actions in h/w are done.
  
  For read, user space buffers are dispensed to NIC driver for rx when
  a page constructor API is invoked. Means NICs can allocate user buffers
  from a page constructor. We add a hook in netif_receive_skb() function
  to intercept the incoming packets, and notify the zero-copy device.
  
  For write, the zero-copy deivce may allocates a new host skb and puts
  payload on the skb_shinfo(skb)-frags, and copied the header to skb-data.
  The request remains pending until the skb is transmitted by h/w.
  
  Here, we have ever considered 2 ways to utilize the page constructor
  API to dispense the user buffers.
  
  One:   Modify __alloc_skb() function a bit, it can only allocate a 
 structure of sk_buff, and the data pointer is pointing to a 
 user buffer which is coming from a page constructor API.
 Then the shinfo of the skb is also from guest.
 When packet is received from hardware, the skb-data is filled
 directly by h/w. What we have done is in this way.
  
 Pros:   We can avoid any copy here.
 Cons:   Guest virtio-net driver needs to allocate skb as almost
 the same method with the host NIC drivers, say the size
 of netdev_alloc_skb() and the same reserved space in the
 head of skb. Many NIC drivers are the same with guest and
 ok for this. But some lastest NIC drivers reserves special
 room in skb head. To deal with it, we suggest to provide
 a method in guest virtio-net driver to ask for parameter
 we interest from the NIC driver when we know which device 
 we have bind to do zero-copy. Then we ask guest to do so.
 Is that reasonable?
 
 Unfortunately, this would break compatibility with existing virtio.
 This also complicates migration. 
 
 You mean any modification to the guest virtio-net driver will break the
 compatibility? We tried to enlarge the virtio_net_config to contains the
 2 parameter, and add one VIRTIO_NET_F_PASSTHRU flag, virtionet_probe()
 will check the feature flag, and get the parameters, then virtio-net driver 
 use
 it to allocate buffers. How about this?

This means that we can't, for example, live-migrate between different systems
without flushing outstanding buffers.

 What is the room in skb head used for?
 I'm not sure, but the latest ixgbe driver does this, it reserves 32 bytes 
 compared to
 NET_IP_ALIGN.

Looking at code, this seems to do with alignment - could just be
a performance optimization.

  Two:   Modify driver to get user buffer allocated from a page 
  constructor
 API(to substitute alloc_page()), the user buffer are used as payload
 buffers and filled by h/w directly when packet is received. Driver
 should associate the pages with skb (skb_shinfo(skb)-frags). For 
 the head buffer side, let host allocates skb, and h/w fills it. 
 After that, the data filled in host skb header will be copied into
 guest header buffer which is submitted together with the payload buffer.
  
 Pros:   We could less care the way how guest or host allocates their
 buffers.
 Cons:   We still need a bit copy here for the skb header.
  
  We are not sure which way is the better here.
 
 The obvious question would be whether you see any speed difference
 with the two approaches. If no, then the second approach would be
 better.
 
 I remember the second approach is a bit slower in 1500MTU. 
 But we did not tested too much.

Well, that's an important datapoint. By the way, you'll need
header copy to activate LRO in host, so that's a good
reason to go with option 2 as well.

  This is the first thing we want
  to get comments from the community. We wish the modification to the network
  part will be generic which not used by vhost-net backend only, but a user
  application may use it as well when the zero-copy device may provides async
  read/write operations later.
  
  Please give comments especially for the network part modifications.
  
  
  We provide multiple submits and asynchronous notifiicaton to 
 vhost-net too.
  
  Our goal is to improve the bandwidth and reduce the CPU usage.
  Exact performance data will be provided later. But for simple
  

Re: [RFC][PATCH v2 0/3] Provide a zero-copy method on KVM virtio-net.

2010-04-14 Thread Michael S. Tsirkin
On Fri, Apr 02, 2010 at 03:25:00PM +0800, xiaohui@intel.com wrote:
 The idea is simple, just to pin the guest VM user space and then
 let host NIC driver has the chance to directly DMA to it. 
 The patches are based on vhost-net backend driver. We add a device
 which provides proto_ops as sendmsg/recvmsg to vhost-net to
 send/recv directly to/from the NIC driver. KVM guest who use the
 vhost-net backend may bind any ethX interface in the host side to
 get copyless data transfer thru guest virtio-net frontend.
 
 The scenario is like this:
 
 The guest virtio-net driver submits multiple requests thru vhost-net
 backend driver to the kernel. And the requests are queued and then
 completed after corresponding actions in h/w are done.
 
 For read, user space buffers are dispensed to NIC driver for rx when
 a page constructor API is invoked. Means NICs can allocate user buffers
 from a page constructor. We add a hook in netif_receive_skb() function
 to intercept the incoming packets, and notify the zero-copy device.
 
 For write, the zero-copy deivce may allocates a new host skb and puts
 payload on the skb_shinfo(skb)-frags, and copied the header to skb-data.
 The request remains pending until the skb is transmitted by h/w.
 
 Here, we have ever considered 2 ways to utilize the page constructor
 API to dispense the user buffers.
 
 One:  Modify __alloc_skb() function a bit, it can only allocate a 
   structure of sk_buff, and the data pointer is pointing to a 
   user buffer which is coming from a page constructor API.
   Then the shinfo of the skb is also from guest.
   When packet is received from hardware, the skb-data is filled
   directly by h/w. What we have done is in this way.
 
   Pros:   We can avoid any copy here.
   Cons:   Guest virtio-net driver needs to allocate skb as almost
   the same method with the host NIC drivers, say the size
   of netdev_alloc_skb() and the same reserved space in the
   head of skb. Many NIC drivers are the same with guest and
   ok for this. But some lastest NIC drivers reserves special
   room in skb head. To deal with it, we suggest to provide
   a method in guest virtio-net driver to ask for parameter
   we interest from the NIC driver when we know which device 
   we have bind to do zero-copy. Then we ask guest to do so.
   Is that reasonable?

Unfortunately, this would break compatibility with existing virtio.
This also complicates migration. What is the room in skb head used for?

 Two:  Modify driver to get user buffer allocated from a page constructor
   API(to substitute alloc_page()), the user buffer are used as payload
   buffers and filled by h/w directly when packet is received. Driver
   should associate the pages with skb (skb_shinfo(skb)-frags). For 
   the head buffer side, let host allocates skb, and h/w fills it. 
   After that, the data filled in host skb header will be copied into
   guest header buffer which is submitted together with the payload buffer.
 
   Pros:   We could less care the way how guest or host allocates their
   buffers.
   Cons:   We still need a bit copy here for the skb header.
 
 We are not sure which way is the better here.

The obvious question would be whether you see any speed difference
with the two approaches. If no, then the second approach would be
better.

 This is the first thing we want
 to get comments from the community. We wish the modification to the network
 part will be generic which not used by vhost-net backend only, but a user
 application may use it as well when the zero-copy device may provides async
 read/write operations later.
 
 Please give comments especially for the network part modifications.
 
 
 We provide multiple submits and asynchronous notifiicaton to 
 vhost-net too.
 
 Our goal is to improve the bandwidth and reduce the CPU usage.
 Exact performance data will be provided later. But for simple
 test with netperf, we found bindwidth up and CPU % up too,
 but the bindwidth up ratio is much more than CPU % up ratio.
 
 What we have not done yet:
   packet split support

What does this mean, exactly?

   To support GRO

And TSO/GSO?

   Performance tuning
 
 what we have done in v1:
   polish the RCU usage
   deal with write logging in asynchroush mode in vhost
   add notifier block for mp device
   rename page_ctor to mp_port in netdevice.h to make it looks generic
   add mp_dev_change_flags() for mp device to change NIC state
   add CONIFG_VHOST_MPASSTHRU to limit the usage when module is not load
   a small fix for missing dev_put when fail
   using dynamic minor instead of static minor number
   a __KERNEL__ protect to mp_get_sock()
 
 what we have done in v2:
   
   remove most of the RCU usage, since the ctor pointer is only
   changed by 

RE: [RFC][PATCH v2 0/3] Provide a zero-copy method on KVM virtio-net.

2010-04-06 Thread Xin, Xiaohui
Sridhar,

 The idea is simple, just to pin the guest VM user space and then
 let host NIC driver has the chance to directly DMA to it. 
 The patches are based on vhost-net backend driver. We add a device
 which provides proto_ops as sendmsg/recvmsg to vhost-net to
 send/recv directly to/from the NIC driver. KVM guest who use the
vhost-net backend may bind any ethX interface in the host side to
 get copyless data transfer thru guest virtio-net frontend.

What is the advantage of this approach compared to PCI-passthrough
of the host NIC to the guest?

PCI-passthrough needs hardware support, a kind of iommu engine will
help to translate guest physical address to host physical address.
And currently, a PCI-passthrough device cannot pass live migration.

The zero-copy is a pure software solution. It doesn't need special hardware 
support.
In theory, it can pass live migration.
 
Does this require pinning of the entire guest memory? Or only the
send/receive buffers?

We need only to pin the send/receive buffers.

Thanks
Xiaohui

Thanks
Sridhar
 
 The scenario is like this:
 
 The guest virtio-net driver submits multiple requests thru vhost-net
 backend driver to the kernel. And the requests are queued and then
 completed after corresponding actions in h/w are done.
 
 For read, user space buffers are dispensed to NIC driver for rx when
 a page constructor API is invoked. Means NICs can allocate user buffers
 from a page constructor. We add a hook in netif_receive_skb() function
 to intercept the incoming packets, and notify the zero-copy device.
 
 For write, the zero-copy deivce may allocates a new host skb and puts
 payload on the skb_shinfo(skb)-frags, and copied the header to skb-data.
 The request remains pending until the skb is transmitted by h/w.
 
 Here, we have ever considered 2 ways to utilize the page constructor
 API to dispense the user buffers.
 
 One:  Modify __alloc_skb() function a bit, it can only allocate a 
   structure of sk_buff, and the data pointer is pointing to a 
   user buffer which is coming from a page constructor API.
   Then the shinfo of the skb is also from guest.
   When packet is received from hardware, the skb-data is filled
   directly by h/w. What we have done is in this way.
 
   Pros:   We can avoid any copy here.
   Cons:   Guest virtio-net driver needs to allocate skb as almost
   the same method with the host NIC drivers, say the size
   of netdev_alloc_skb() and the same reserved space in the
   head of skb. Many NIC drivers are the same with guest and
   ok for this. But some lastest NIC drivers reserves special
   room in skb head. To deal with it, we suggest to provide
   a method in guest virtio-net driver to ask for parameter
   we interest from the NIC driver when we know which device 
   we have bind to do zero-copy. Then we ask guest to do so.
   Is that reasonable?
 
 Two:  Modify driver to get user buffer allocated from a page constructor
   API(to substitute alloc_page()), the user buffer are used as payload
   buffers and filled by h/w directly when packet is received. Driver
   should associate the pages with skb (skb_shinfo(skb)-frags). For 
   the head buffer side, let host allocates skb, and h/w fills it. 
   After that, the data filled in host skb header will be copied into
   guest header buffer which is submitted together with the payload buffer.
 
   Pros:   We could less care the way how guest or host allocates their
   buffers.
   Cons:   We still need a bit copy here for the skb header.
 
 We are not sure which way is the better here. This is the first thing we want
 to get comments from the community. We wish the modification to the network
 part will be generic which not used by vhost-net backend only, but a user
 application may use it as well when the zero-copy device may provides async
 read/write operations later.
 
 Please give comments especially for the network part modifications.
 
 
 We provide multiple submits and asynchronous notifiicaton to 
 vhost-net too.
 
 Our goal is to improve the bandwidth and reduce the CPU usage.
 Exact performance data will be provided later. But for simple
 test with netperf, we found bindwidth up and CPU % up too,
 but the bindwidth up ratio is much more than CPU % up ratio.
 
 What we have not done yet:
   packet split support
   To support GRO
   Performance tuning
 
 what we have done in v1:
   polish the RCU usage
   deal with write logging in asynchroush mode in vhost
   add notifier block for mp device
   rename page_ctor to mp_port in netdevice.h to make it looks generic
   add mp_dev_change_flags() for mp device to change NIC state
   add CONIFG_VHOST_MPASSTHRU to limit the usage when module is not load
   a small fix for missing dev_put when fail
   using dynamic minor 

Re: [RFC][PATCH v2 0/3] Provide a zero-copy method on KVM virtio-net.

2010-04-03 Thread Avi Kivity

On 04/03/2010 02:51 AM, Sridhar Samudrala wrote:

On Fri, 2010-04-02 at 15:25 +0800, xiaohui@intel.com wrote:
   

The idea is simple, just to pin the guest VM user space and then
let host NIC driver has the chance to directly DMA to it.
The patches are based on vhost-net backend driver. We add a device
which provides proto_ops as sendmsg/recvmsg to vhost-net to
send/recv directly to/from the NIC driver. KVM guest who use the
vhost-net backend may bind any ethX interface in the host side to
get copyless data transfer thru guest virtio-net frontend.
 

What is the advantage of this approach compared to PCI-passthrough
of the host NIC to the guest?
   


swapping/ksm/etc
independence from host hardware
live migration


Does this require pinning of the entire guest memory? Or only the
send/receive buffers?
   


If done correctly, just the send/receive buffers.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH v2 0/3] Provide a zero-copy method on KVM virtio-net.

2010-04-02 Thread Sridhar Samudrala
On Fri, 2010-04-02 at 15:25 +0800, xiaohui@intel.com wrote:
 The idea is simple, just to pin the guest VM user space and then
 let host NIC driver has the chance to directly DMA to it. 
 The patches are based on vhost-net backend driver. We add a device
 which provides proto_ops as sendmsg/recvmsg to vhost-net to
 send/recv directly to/from the NIC driver. KVM guest who use the
 vhost-net backend may bind any ethX interface in the host side to
 get copyless data transfer thru guest virtio-net frontend.

What is the advantage of this approach compared to PCI-passthrough
of the host NIC to the guest?
Does this require pinning of the entire guest memory? Or only the
send/receive buffers?

Thanks
Sridhar
 
 The scenario is like this:
 
 The guest virtio-net driver submits multiple requests thru vhost-net
 backend driver to the kernel. And the requests are queued and then
 completed after corresponding actions in h/w are done.
 
 For read, user space buffers are dispensed to NIC driver for rx when
 a page constructor API is invoked. Means NICs can allocate user buffers
 from a page constructor. We add a hook in netif_receive_skb() function
 to intercept the incoming packets, and notify the zero-copy device.
 
 For write, the zero-copy deivce may allocates a new host skb and puts
 payload on the skb_shinfo(skb)-frags, and copied the header to skb-data.
 The request remains pending until the skb is transmitted by h/w.
 
 Here, we have ever considered 2 ways to utilize the page constructor
 API to dispense the user buffers.
 
 One:  Modify __alloc_skb() function a bit, it can only allocate a 
   structure of sk_buff, and the data pointer is pointing to a 
   user buffer which is coming from a page constructor API.
   Then the shinfo of the skb is also from guest.
   When packet is received from hardware, the skb-data is filled
   directly by h/w. What we have done is in this way.
 
   Pros:   We can avoid any copy here.
   Cons:   Guest virtio-net driver needs to allocate skb as almost
   the same method with the host NIC drivers, say the size
   of netdev_alloc_skb() and the same reserved space in the
   head of skb. Many NIC drivers are the same with guest and
   ok for this. But some lastest NIC drivers reserves special
   room in skb head. To deal with it, we suggest to provide
   a method in guest virtio-net driver to ask for parameter
   we interest from the NIC driver when we know which device 
   we have bind to do zero-copy. Then we ask guest to do so.
   Is that reasonable?
 
 Two:  Modify driver to get user buffer allocated from a page constructor
   API(to substitute alloc_page()), the user buffer are used as payload
   buffers and filled by h/w directly when packet is received. Driver
   should associate the pages with skb (skb_shinfo(skb)-frags). For 
   the head buffer side, let host allocates skb, and h/w fills it. 
   After that, the data filled in host skb header will be copied into
   guest header buffer which is submitted together with the payload buffer.
 
   Pros:   We could less care the way how guest or host allocates their
   buffers.
   Cons:   We still need a bit copy here for the skb header.
 
 We are not sure which way is the better here. This is the first thing we want
 to get comments from the community. We wish the modification to the network
 part will be generic which not used by vhost-net backend only, but a user
 application may use it as well when the zero-copy device may provides async
 read/write operations later.
 
 Please give comments especially for the network part modifications.
 
 
 We provide multiple submits and asynchronous notifiicaton to 
 vhost-net too.
 
 Our goal is to improve the bandwidth and reduce the CPU usage.
 Exact performance data will be provided later. But for simple
 test with netperf, we found bindwidth up and CPU % up too,
 but the bindwidth up ratio is much more than CPU % up ratio.
 
 What we have not done yet:
   packet split support
   To support GRO
   Performance tuning
 
 what we have done in v1:
   polish the RCU usage
   deal with write logging in asynchroush mode in vhost
   add notifier block for mp device
   rename page_ctor to mp_port in netdevice.h to make it looks generic
   add mp_dev_change_flags() for mp device to change NIC state
   add CONIFG_VHOST_MPASSTHRU to limit the usage when module is not load
   a small fix for missing dev_put when fail
   using dynamic minor instead of static minor number
   a __KERNEL__ protect to mp_get_sock()
 
 what we have done in v2:
   
   remove most of the RCU usage, since the ctor pointer is only
   changed by BIND/UNBIND ioctl, and during that time, NIC will be
   stopped to get good cleanup(all outstanding requests are finished),
   so the