Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?

2014-10-08 Thread Zhangjie (HZ)
Thanks for your patient answer! :-)

On 2014/9/30 17:33, Michael S. Tsirkin wrote:
 On Tue, Sep 30, 2014 at 04:36:00PM +0800, Zhangjie (HZ) wrote:
 Hi,
 There exits packets loss when we do packet forwarding in VM,
 especially when we use dpdk to do the forwarding. By enlarging vring
 can alleviate the problem.
 
 I think this has to do with the fact that dpdk disables
 checksum offloading, this has the side effect of disabling
 segmentation offloading.
 
 Please fix dpdk to support checksum offloading, and
 I think the problem will go away.
In some application scene, loss of udp packets are not allowed,
 and udp packets are always short than mtu.
So, we need to support high pps(eg.0.3M Packets/s) forwarding, and
offloading cannot fix it.
 
 
 But now vring size is limited to 1024 as follows:
 VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
 void (*handle_output)(VirtIODevice *, VirtQueue 
 *))
 {
  ...
  if (i == VIRTIO_PCI_QUEUE_MAX || queue_size  VIRTQUEUE_MAX_SIZE)
 abort();
 }
 ps:#define VIRTQUEUE_MAX_SIZE 1024
 I delete the judgement code, and set vring size to 2048,
 VM can be successfully started, and the network is ok too.
 So, Why vring size is limited to 1024 and what is the influence?

 Thanks!
 
 There are several reason for this limit.
 First guest has to allocate descriptor buffer which is 16 * vring size.
 With 1K size that is already 16K which might be tricky to
 allocate contigiously if memory is fragmented when device is
 added by hotplug.
That is very
 The second issue is that we want to be able to implement
 the device on top of linux kernel, and
 a single descriptor might use all of
 the virtqueue. In this case we wont to be able to pass the
 descriptor directly to linux as a single iov, since
 that is limited to 1K entries.
For the second issue, I wonder if it is ok to set vring size of virtio-net to 
large than 1024,
as for net work, there is at most 18 pages for a skb, it will not exceed iov.
 
 -- 
 Best Wishes!
 Zhang Jie
 .
 

-- 
Best Wishes!
Zhang Jie




Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?

2014-10-08 Thread Michael S. Tsirkin
On Wed, Oct 08, 2014 at 03:17:56PM +0800, Zhangjie (HZ) wrote:
 Thanks for your patient answer! :-)
 
 On 2014/9/30 17:33, Michael S. Tsirkin wrote:
  On Tue, Sep 30, 2014 at 04:36:00PM +0800, Zhangjie (HZ) wrote:
  Hi,
  There exits packets loss when we do packet forwarding in VM,
  especially when we use dpdk to do the forwarding. By enlarging vring
  can alleviate the problem.
  
  I think this has to do with the fact that dpdk disables
  checksum offloading, this has the side effect of disabling
  segmentation offloading.
  
  Please fix dpdk to support checksum offloading, and
  I think the problem will go away.
 In some application scene, loss of udp packets are not allowed,
  and udp packets are always short than mtu.
 So, we need to support high pps(eg.0.3M Packets/s) forwarding, and
 offloading cannot fix it.

That's the point. With UFO you get larger than MTU UDP packets:
http://www.linuxfoundation.org/collaborate/workgroups/networking/ufo

Additionally, checksum offloading reduces CPU utilization
and reduces the number of data copies, allowing higher pps
with smaller buffers.

It might look like queue depth helps performance for netperf, but in
real-life workloads the latency under load will suffer, with more
protocols implementing tunnelling on top of UDP such extreme bufferbloat
will not be tolerated.

  
  
  But now vring size is limited to 1024 as follows:
  VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
  void (*handle_output)(VirtIODevice *, 
  VirtQueue *))
  {
 ...
 if (i == VIRTIO_PCI_QUEUE_MAX || queue_size  VIRTQUEUE_MAX_SIZE)
  abort();
  }
  ps:#define VIRTQUEUE_MAX_SIZE 1024
  I delete the judgement code, and set vring size to 2048,
  VM can be successfully started, and the network is ok too.
  So, Why vring size is limited to 1024 and what is the influence?
 
  Thanks!
  
  There are several reason for this limit.
  First guest has to allocate descriptor buffer which is 16 * vring size.
  With 1K size that is already 16K which might be tricky to
  allocate contigiously if memory is fragmented when device is
  added by hotplug.
 That is very
  The second issue is that we want to be able to implement
  the device on top of linux kernel, and
  a single descriptor might use all of
  the virtqueue. In this case we wont to be able to pass the
  descriptor directly to linux as a single iov, since
  that is limited to 1K entries.
 For the second issue, I wonder if it is ok to set vring size of virtio-net to 
 large than 1024,
 as for net work, there is at most 18 pages for a skb, it will not exceed iov.
  
  -- 
  Best Wishes!
  Zhang Jie
  .
  
 
 -- 
 Best Wishes!
 Zhang Jie



Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?

2014-10-08 Thread Avi Kivity


On 09/30/2014 12:33 PM, Michael S. Tsirkin wrote:

a single descriptor might use all of
the virtqueue. In this case we wont to be able to pass the
descriptor directly to linux as a single iov, since



You could separate maximum request scatter/gather list size from the 
virtqueue size.  They are totally unrelated - even now you can have a 
larger request by using indirect descriptors.





Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?

2014-10-08 Thread Zhangjie (HZ)
MST, Thanks very much, I get it.

On 2014/10/8 15:37, Michael S. Tsirkin wrote:
 On Wed, Oct 08, 2014 at 03:17:56PM +0800, Zhangjie (HZ) wrote:
 Thanks for your patient answer! :-)

 On 2014/9/30 17:33, Michael S. Tsirkin wrote:
 On Tue, Sep 30, 2014 at 04:36:00PM +0800, Zhangjie (HZ) wrote:
 Hi,
 There exits packets loss when we do packet forwarding in VM,
 especially when we use dpdk to do the forwarding. By enlarging vring
 can alleviate the problem.

 I think this has to do with the fact that dpdk disables
 checksum offloading, this has the side effect of disabling
 segmentation offloading.

 Please fix dpdk to support checksum offloading, and
 I think the problem will go away.
 In some application scene, loss of udp packets are not allowed,
  and udp packets are always short than mtu.
 So, we need to support high pps(eg.0.3M Packets/s) forwarding, and
 offloading cannot fix it.
 
 That's the point. With UFO you get larger than MTU UDP packets:
 http://www.linuxfoundation.org/collaborate/workgroups/networking/ufo
Then vm only do forwarding, and not create new packets itself.
As we can not gro normal udp packets, when udp packets come from the nic of 
host, ufo cannot work.
 
 Additionally, checksum offloading reduces CPU utilization
 and reduces the number of data copies, allowing higher pps
 with smaller buffers.
 
 It might look like queue depth helps performance for netperf, but in
 real-life workloads the latency under load will suffer, with more
 protocols implementing tunnelling on top of UDP such extreme bufferbloat
 will not be tolerated.
 


 But now vring size is limited to 1024 as follows:
 VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
 void (*handle_output)(VirtIODevice *, 
 VirtQueue *))
 {
...
if (i == VIRTIO_PCI_QUEUE_MAX || queue_size  VIRTQUEUE_MAX_SIZE)
 abort();
 }
 ps:#define VIRTQUEUE_MAX_SIZE 1024
 I delete the judgement code, and set vring size to 2048,
 VM can be successfully started, and the network is ok too.
 So, Why vring size is limited to 1024 and what is the influence?

 Thanks!

 There are several reason for this limit.
 First guest has to allocate descriptor buffer which is 16 * vring size.
 With 1K size that is already 16K which might be tricky to
 allocate contigiously if memory is fragmented when device is
 added by hotplug.
 That is very
 The second issue is that we want to be able to implement
 the device on top of linux kernel, and
 a single descriptor might use all of
 the virtqueue. In this case we wont to be able to pass the
 descriptor directly to linux as a single iov, since
 that is limited to 1K entries.
 For the second issue, I wonder if it is ok to set vring size of virtio-net 
 to large than 1024,
 as for net work, there is at most 18 pages for a skb, it will not exceed iov.

 -- 
 Best Wishes!
 Zhang Jie
 .


 -- 
 Best Wishes!
 Zhang Jie
 .
 

-- 
Best Wishes!
Zhang Jie




Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?

2014-10-08 Thread Zhangjie (HZ)


On 2014/10/8 15:43, Avi Kivity wrote:

 
 You could separate maximum request scatter/gather list size from the 
 virtqueue size.  They are totally unrelated - even now you can have a larger 
 request by using indirect descriptors.
Yes, there is no strong correlation between virtqueue size and iov form the 
code.
-- 
Best Wishes!
Zhang Jie




Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?

2014-10-08 Thread Michael S. Tsirkin
On Wed, Oct 08, 2014 at 04:07:47PM +0800, Zhangjie (HZ) wrote:
 MST, Thanks very much, I get it.
 
 On 2014/10/8 15:37, Michael S. Tsirkin wrote:
  On Wed, Oct 08, 2014 at 03:17:56PM +0800, Zhangjie (HZ) wrote:
  Thanks for your patient answer! :-)
 
  On 2014/9/30 17:33, Michael S. Tsirkin wrote:
  On Tue, Sep 30, 2014 at 04:36:00PM +0800, Zhangjie (HZ) wrote:
  Hi,
  There exits packets loss when we do packet forwarding in VM,
  especially when we use dpdk to do the forwarding. By enlarging vring
  can alleviate the problem.
 
  I think this has to do with the fact that dpdk disables
  checksum offloading, this has the side effect of disabling
  segmentation offloading.
 
  Please fix dpdk to support checksum offloading, and
  I think the problem will go away.
  In some application scene, loss of udp packets are not allowed,
   and udp packets are always short than mtu.
  So, we need to support high pps(eg.0.3M Packets/s) forwarding, and
  offloading cannot fix it.
  
  That's the point. With UFO you get larger than MTU UDP packets:
  http://www.linuxfoundation.org/collaborate/workgroups/networking/ufo
 Then vm only do forwarding, and not create new packets itself.
 As we can not gro normal udp packets, when udp packets come from the nic of 
 host, ufo cannot work.

This is something I've been thinking about for a while now.
We really should add GRO-like path for UDP, this isn't
too different from UDP.

LRO can often work with UDP too, but linux discards too much
info on LRO, but if you are doing drivers in userspace
you might be able to support this.

  
  Additionally, checksum offloading reduces CPU utilization
  and reduces the number of data copies, allowing higher pps
  with smaller buffers.
  
  It might look like queue depth helps performance for netperf, but in
  real-life workloads the latency under load will suffer, with more
  protocols implementing tunnelling on top of UDP such extreme bufferbloat
  will not be tolerated.
  
 
 
  But now vring size is limited to 1024 as follows:
  VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
  void (*handle_output)(VirtIODevice *, 
  VirtQueue *))
  {
   ...
   if (i == VIRTIO_PCI_QUEUE_MAX || queue_size  VIRTQUEUE_MAX_SIZE)
  abort();
  }
  ps:#define VIRTQUEUE_MAX_SIZE 1024
  I delete the judgement code, and set vring size to 2048,
  VM can be successfully started, and the network is ok too.
  So, Why vring size is limited to 1024 and what is the influence?
 
  Thanks!
 
  There are several reason for this limit.
  First guest has to allocate descriptor buffer which is 16 * vring size.
  With 1K size that is already 16K which might be tricky to
  allocate contigiously if memory is fragmented when device is
  added by hotplug.
  That is very
  The second issue is that we want to be able to implement
  the device on top of linux kernel, and
  a single descriptor might use all of
  the virtqueue. In this case we wont to be able to pass the
  descriptor directly to linux as a single iov, since
  that is limited to 1K entries.
  For the second issue, I wonder if it is ok to set vring size of virtio-net 
  to large than 1024,
  as for net work, there is at most 18 pages for a skb, it will not exceed 
  iov.
 
  -- 
  Best Wishes!
  Zhang Jie
  .
 
 
  -- 
  Best Wishes!
  Zhang Jie
  .
  
 
 -- 
 Best Wishes!
 Zhang Jie



Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?

2014-10-08 Thread Michael S. Tsirkin
On Wed, Oct 08, 2014 at 10:43:07AM +0300, Avi Kivity wrote:
 
 On 09/30/2014 12:33 PM, Michael S. Tsirkin wrote:
 a single descriptor might use all of
 the virtqueue. In this case we wont to be able to pass the
 descriptor directly to linux as a single iov, since
 
 
 You could separate maximum request scatter/gather list size from the
 virtqueue size.  They are totally unrelated - even now you can have a larger
 request by using indirect descriptors.

We could add a feature to have a smaller or larger S/G length limit.
Is this something useful?

-- 
MST



Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?

2014-10-08 Thread Avi Kivity


On 10/08/2014 12:15 PM, Michael S. Tsirkin wrote:

On Wed, Oct 08, 2014 at 10:43:07AM +0300, Avi Kivity wrote:

On 09/30/2014 12:33 PM, Michael S. Tsirkin wrote:

a single descriptor might use all of
the virtqueue. In this case we wont to be able to pass the
descriptor directly to linux as a single iov, since


You could separate maximum request scatter/gather list size from the
virtqueue size.  They are totally unrelated - even now you can have a larger
request by using indirect descriptors.

We could add a feature to have a smaller or larger S/G length limit.
Is this something useful?



Having a larger ring size is useful, esp. with zero-copy transmit, and 
you would need the sglist length limit in order to not require 
linearization on linux hosts.  So the limit is not useful in itself, 
only indirectly.


Google cloud engine exposes virtio ring sizes of 16384.

Even more useful is getting rid of the desc array and instead passing 
descs inline in avail and used.





Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?

2014-10-08 Thread Michael S. Tsirkin
On Wed, Oct 08, 2014 at 12:51:21PM +0300, Avi Kivity wrote:
 
 On 10/08/2014 12:15 PM, Michael S. Tsirkin wrote:
 On Wed, Oct 08, 2014 at 10:43:07AM +0300, Avi Kivity wrote:
 On 09/30/2014 12:33 PM, Michael S. Tsirkin wrote:
 a single descriptor might use all of
 the virtqueue. In this case we wont to be able to pass the
 descriptor directly to linux as a single iov, since
 
 You could separate maximum request scatter/gather list size from the
 virtqueue size.  They are totally unrelated - even now you can have a larger
 request by using indirect descriptors.
 We could add a feature to have a smaller or larger S/G length limit.
 Is this something useful?
 
 
 Having a larger ring size is useful, esp. with zero-copy transmit, and you
 would need the sglist length limit in order to not require linearization on
 linux hosts.  So the limit is not useful in itself, only indirectly.
 
 Google cloud engine exposes virtio ring sizes of 16384.

OK this sounds useful, I'll queue this up for consideration.
Thanks!

 Even more useful is getting rid of the desc array and instead passing descs
 inline in avail and used.

You expect this to improve performance?
Quite possibly but this will have to be demonstrated.

-- 
MST



Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?

2014-10-08 Thread Avi Kivity


On 10/08/2014 01:14 PM, Michael S. Tsirkin wrote:

On Wed, Oct 08, 2014 at 12:51:21PM +0300, Avi Kivity wrote:

On 10/08/2014 12:15 PM, Michael S. Tsirkin wrote:

On Wed, Oct 08, 2014 at 10:43:07AM +0300, Avi Kivity wrote:

On 09/30/2014 12:33 PM, Michael S. Tsirkin wrote:

a single descriptor might use all of
the virtqueue. In this case we wont to be able to pass the
descriptor directly to linux as a single iov, since


You could separate maximum request scatter/gather list size from the
virtqueue size.  They are totally unrelated - even now you can have a larger
request by using indirect descriptors.

We could add a feature to have a smaller or larger S/G length limit.
Is this something useful?


Having a larger ring size is useful, esp. with zero-copy transmit, and you
would need the sglist length limit in order to not require linearization on
linux hosts.  So the limit is not useful in itself, only indirectly.

Google cloud engine exposes virtio ring sizes of 16384.

OK this sounds useful, I'll queue this up for consideration.
Thanks!


Thanks.


Even more useful is getting rid of the desc array and instead passing descs
inline in avail and used.

You expect this to improve performance?
Quite possibly but this will have to be demonstrated.



The top vhost function in small packet workloads is vhost_get_vq_desc, 
and the top instruction within that (50%) is the one that reads the 
first 8 bytes of desc.  It's a guaranteed cache line miss (and again on 
the guest side when it's time to reuse).


Inline descriptors will amortize the cache miss over 4 descriptors, and 
will allow the hardware to prefetch, since the descriptors are linear in 
memory.






Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?

2014-10-08 Thread Michael S. Tsirkin
On Wed, Oct 08, 2014 at 01:37:25PM +0300, Avi Kivity wrote:
 
 On 10/08/2014 01:14 PM, Michael S. Tsirkin wrote:
 On Wed, Oct 08, 2014 at 12:51:21PM +0300, Avi Kivity wrote:
 On 10/08/2014 12:15 PM, Michael S. Tsirkin wrote:
 On Wed, Oct 08, 2014 at 10:43:07AM +0300, Avi Kivity wrote:
 On 09/30/2014 12:33 PM, Michael S. Tsirkin wrote:
 a single descriptor might use all of
 the virtqueue. In this case we wont to be able to pass the
 descriptor directly to linux as a single iov, since
 
 You could separate maximum request scatter/gather list size from the
 virtqueue size.  They are totally unrelated - even now you can have a 
 larger
 request by using indirect descriptors.
 We could add a feature to have a smaller or larger S/G length limit.
 Is this something useful?
 
 Having a larger ring size is useful, esp. with zero-copy transmit, and you
 would need the sglist length limit in order to not require linearization on
 linux hosts.  So the limit is not useful in itself, only indirectly.
 
 Google cloud engine exposes virtio ring sizes of 16384.
 OK this sounds useful, I'll queue this up for consideration.
 Thanks!
 
 Thanks.
 
 Even more useful is getting rid of the desc array and instead passing descs
 inline in avail and used.
 You expect this to improve performance?
 Quite possibly but this will have to be demonstrated.
 
 
 The top vhost function in small packet workloads is vhost_get_vq_desc, and
 the top instruction within that (50%) is the one that reads the first 8
 bytes of desc.  It's a guaranteed cache line miss (and again on the guest
 side when it's time to reuse).

OK so basically what you are pointing out is that we get 5 accesses:
read of available head, read of available ring, read of descriptor,
write of used ring, write of used ring head.

If processing is in-order, we could build a much simpler design, with a
valid bit in the descriptor, cleared by host as descriptors are
consumed.

Basically get rid of both used and available ring.

Sounds good in theory.

 Inline descriptors will amortize the cache miss over 4 descriptors, and will
 allow the hardware to prefetch, since the descriptors are linear in memory.

If descriptors are used in order (as they are with current qemu)
then aren't they amortized already?

-- 
MST



Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?

2014-10-08 Thread Avi Kivity


On 10/08/2014 01:55 PM, Michael S. Tsirkin wrote:

Even more useful is getting rid of the desc array and instead passing descs
inline in avail and used.

You expect this to improve performance?
Quite possibly but this will have to be demonstrated.


The top vhost function in small packet workloads is vhost_get_vq_desc, and
the top instruction within that (50%) is the one that reads the first 8
bytes of desc.  It's a guaranteed cache line miss (and again on the guest
side when it's time to reuse).

OK so basically what you are pointing out is that we get 5 accesses:
read of available head, read of available ring, read of descriptor,
write of used ring, write of used ring head.


Right.  And only read of descriptor is not amortized.


If processing is in-order, we could build a much simpler design, with a
valid bit in the descriptor, cleared by host as descriptors are
consumed.

Basically get rid of both used and available ring.


That only works if you don't allow reordering, which is never the case 
for block, and not the case for zero-copy net.  It also has writers on 
both side of the ring.


The right design is to keep avail and used, but instead of making them 
rings of pointers to descs, make them rings of descs.


The host reads descs from avail, processes them, then writes them back 
on used (possibly out-of-order).  The guest writes descs to avail and 
reads them back from used.


You'll probably have to add a 64-bit cookie to desc so you can complete 
without an additional lookup.




Sounds good in theory.


Inline descriptors will amortize the cache miss over 4 descriptors, and will
allow the hardware to prefetch, since the descriptors are linear in memory.

If descriptors are used in order (as they are with current qemu)
then aren't they amortized already?






Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?

2014-10-08 Thread Avi Kivity


On 10/08/2014 01:55 PM, Michael S. Tsirkin wrote:



Inline descriptors will amortize the cache miss over 4 descriptors, and will
allow the hardware to prefetch, since the descriptors are linear in memory.

If descriptors are used in order (as they are with current qemu)
then aren't they amortized already?



The descriptors are only in-order for non-zero-copy net.  They are out 
of order for block and zero-copy net.


(also, the guest has to be careful in how it allocates descriptors).



Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?

2014-10-08 Thread Michael S. Tsirkin
On Wed, Oct 08, 2014 at 01:59:13PM +0300, Avi Kivity wrote:
 
 On 10/08/2014 01:55 PM, Michael S. Tsirkin wrote:
 Even more useful is getting rid of the desc array and instead passing 
 descs
 inline in avail and used.
 You expect this to improve performance?
 Quite possibly but this will have to be demonstrated.
 
 The top vhost function in small packet workloads is vhost_get_vq_desc, and
 the top instruction within that (50%) is the one that reads the first 8
 bytes of desc.  It's a guaranteed cache line miss (and again on the guest
 side when it's time to reuse).
 OK so basically what you are pointing out is that we get 5 accesses:
 read of available head, read of available ring, read of descriptor,
 write of used ring, write of used ring head.
 
 Right.  And only read of descriptor is not amortized.
 
 If processing is in-order, we could build a much simpler design, with a
 valid bit in the descriptor, cleared by host as descriptors are
 consumed.
 
 Basically get rid of both used and available ring.
 
 That only works if you don't allow reordering, which is never the case for
 block, and not the case for zero-copy net.  It also has writers on both side
 of the ring.
 
 The right design is to keep avail and used, but instead of making them rings
 of pointers to descs, make them rings of descs.
 
 The host reads descs from avail, processes them, then writes them back on
 used (possibly out-of-order).  The guest writes descs to avail and reads
 them back from used.
 
 You'll probably have to add a 64-bit cookie to desc so you can complete
 without an additional lookup.

My old presentation from 2012 or so suggested something like this.
We don't need a 64 bit cookie I think - a small 16 bit one
should be enough.

 
 Sounds good in theory.
 
 Inline descriptors will amortize the cache miss over 4 descriptors, and will
 allow the hardware to prefetch, since the descriptors are linear in memory.
 If descriptors are used in order (as they are with current qemu)
 then aren't they amortized already?
 



Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?

2014-10-08 Thread Avi Kivity


On 10/08/2014 03:22 PM, Michael S. Tsirkin wrote:

On Wed, Oct 08, 2014 at 01:59:13PM +0300, Avi Kivity wrote:

On 10/08/2014 01:55 PM, Michael S. Tsirkin wrote:

Even more useful is getting rid of the desc array and instead passing descs
inline in avail and used.

You expect this to improve performance?
Quite possibly but this will have to be demonstrated.


The top vhost function in small packet workloads is vhost_get_vq_desc, and
the top instruction within that (50%) is the one that reads the first 8
bytes of desc.  It's a guaranteed cache line miss (and again on the guest
side when it's time to reuse).

OK so basically what you are pointing out is that we get 5 accesses:
read of available head, read of available ring, read of descriptor,
write of used ring, write of used ring head.

Right.  And only read of descriptor is not amortized.


If processing is in-order, we could build a much simpler design, with a
valid bit in the descriptor, cleared by host as descriptors are
consumed.

Basically get rid of both used and available ring.

That only works if you don't allow reordering, which is never the case for
block, and not the case for zero-copy net.  It also has writers on both side
of the ring.

The right design is to keep avail and used, but instead of making them rings
of pointers to descs, make them rings of descs.

The host reads descs from avail, processes them, then writes them back on
used (possibly out-of-order).  The guest writes descs to avail and reads
them back from used.

You'll probably have to add a 64-bit cookie to desc so you can complete
without an additional lookup.

My old presentation from 2012 or so suggested something like this.
We don't need a 64 bit cookie I think - a small 16 bit one
should be enough.



A 16 bit cookie means you need an extra table to hold the real request 
pointers.


With a 64-bit cookie you can store a pointer to the skbuff or bio in the 
ring itself, and avoid the extra lookup.


The extra lookup isn't the end of the world, since doesn't cross core 
boundaries, but it's worth avoiding.





Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?

2014-10-08 Thread Avi Kivity


On 10/08/2014 03:28 PM, Avi Kivity wrote:

My old presentation from 2012 or so suggested something like this.

We don't need a 64 bit cookie I think - a small 16 bit one
should be enough.



A 16 bit cookie means you need an extra table to hold the real request 
pointers.


With a 64-bit cookie you can store a pointer to the skbuff or bio in 
the ring itself, and avoid the extra lookup.


The extra lookup isn't the end of the world, since doesn't cross core 
boundaries, but it's worth avoiding.




What you can do is have two types of descriptors: head and fragment

union desc {
struct head {
 u16 nfrags
 u16 flags
 u64 cookie
}
struct frag {
 u64 paddr
 u16 flen
 u16 flags
}
}

so now a request length is 12*(nfrags+1).

You can be evil and steal some bits from paddr/cookie, and have each 
descriptor 8 bytes long.


btw, I also recommend storing things like vnet_hdr in the ring itself, 
instead of out-of-line in memory.  Maybe the ring should just transport 
bytes and let the upper layer decide how it's formatted.






Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?

2014-09-30 Thread Michael S. Tsirkin
On Tue, Sep 30, 2014 at 04:36:00PM +0800, Zhangjie (HZ) wrote:
 Hi,
 There exits packets loss when we do packet forwarding in VM,
 especially when we use dpdk to do the forwarding. By enlarging vring
 can alleviate the problem.

I think this has to do with the fact that dpdk disables
checksum offloading, this has the side effect of disabling
segmentation offloading.

Please fix dpdk to support checksum offloading, and
I think the problem will go away.


 But now vring size is limited to 1024 as follows:
 VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
 void (*handle_output)(VirtIODevice *, VirtQueue 
 *))
 {
   ...
   if (i == VIRTIO_PCI_QUEUE_MAX || queue_size  VIRTQUEUE_MAX_SIZE)
 abort();
 }
 ps:#define VIRTQUEUE_MAX_SIZE 1024
 I delete the judgement code, and set vring size to 2048,
 VM can be successfully started, and the network is ok too.
 So, Why vring size is limited to 1024 and what is the influence?
 
 Thanks!

There are several reason for this limit.
First guest has to allocate descriptor buffer which is 16 * vring size.
With 1K size that is already 16K which might be tricky to
allocate contigiously if memory is fragmented when device is
added by hotplug.
The second issue is that we want to be able to implement
the device on top of linux kernel, and
a single descriptor might use all of
the virtqueue. In this case we wont to be able to pass the
descriptor directly to linux as a single iov, since
that is limited to 1K entries.

 -- 
 Best Wishes!
 Zhang Jie